Skip to content

Knowledge Base

KnowledgeBase

Bases: SynalinksSaveable

A knowledge base for storing and retrieving structured data.

The KnowledgeBase provides a unified interface over two complementary stores: a SQL row/table store (DuckDB by default) and a property-graph store (LadybugDB by default). The two are orthogonal — SQL methods (update, sql, similarity_search, ...) route to the SQL adapter; graph methods (update_entities, cypher, entity_similarity_search, ...) route to the graph adapter.

A no-args KnowledgeBase() instantiates BOTH stores under synalinks_home() (database.db for SQL, database.lb for the graph) so the two sides are usable side-by-side without setup. Pass uri= alone for SQL-only, graph_uri= alone for graph-only, or both to point each side at a custom location.

Basic Usage
import synalinks

class Document(synalinks.DataModel):
    id: str
    title: str
    content: str

# Create a knowledge base without embeddings (full-text search only)
knowledge_base = synalinks.KnowledgeBase(
    uri="duckdb://my_database.db",
    data_models=[Document],
)

# Store a document
doc = Document(id="1", title="Hello", content="Hello World!")
await knowledge_base.update(doc.to_json_data_model())

# Retrieve by ID (the first field, here 'id', is the primary key — see
# the "Primary Key Convention" section below).
result = await knowledge_base.get("1", table_name="Document")

# Full-text search
results = await knowledge_base.fulltext_search("Hello", k=10)
Primary Key Convention

Synalinks does not inject a synthetic uuid / _id column. The primary key is the first declared field of your DataModel, in declaration order, after skipping reserved structural fields:

  • For SQL tables (DuckDB): the first property of the schema.
  • For graph entities (Ladybug nodes): the first property after label. label is the node-table name, not a column.
  • For graph relations (Ladybug edges): the first property after subj / label / obj. Those three are reserved — the endpoints are resolved against the node tables, and the label is the edge-table name.

Because the PK is just "whichever field you declared first", a KnowledgeBase can be pointed at a pre-existing DuckDB file or LadybugDB store without rewriting rows or renaming columns: declare your DataModel so its first field matches the column you already treat as the identifier (id, ticker, isbn, email, whatever it happens to be) and the adapters will use it. If you want a UUID-style key, declare it explicitly as the first field and populate it yourself — generating identifiers is the caller's job, not the framework's.

embedding_model = synalinks.EmbeddingModel(
    model="ollama/mxbai-embed-large"
)

knowledge_base = synalinks.KnowledgeBase(
    uri="duckdb://./my_database.db",
    data_models=[Document],
    embedding_model=embedding_model,
    metric="cosine",
)

# Hybrid search (combines BM25 fulltext + vector similarity, fused by RRF)
results = await knowledge_base.hybrid_fts_search("semantic query", k=10)
Retrieving Table Definitions
# Get all symbolic data models (table definitions) from the database
symbolic_models = knowledge_base.get_symbolic_data_models()

for model in symbolic_models:
    print(model.get_schema())
    # {'title': 'Document', 'type': 'object', 'properties': {...}, ...}

Parameters:

Name Type Description Default
uri str

SQL store connection URI ("duckdb://path/to/db.db"). When both uri and graph_uri are omitted, defaults to {synalinks_home()}/{name or 'database'}.db. Pass uri alone to opt out of the graph-side default.

None
graph_uri str

Graph store connection URI ("ladybug://path/to/graph.lb" or "ladybug://:memory:"). When both URIs are omitted, defaults to {synalinks_home()}/{name or 'database'}.lb. Pass graph_uri alone to opt out of the SQL-side default.

None
data_models list

Optional list of DataModel or SymbolicDataModel classes to create tables for in the SQL store.

None
entity_models list

Optional list of entity (node) models for the graph store.

None
relation_models list

Optional list of relation (edge) models for the graph store.

None
embedding_model EmbeddingModel

Optional embedding model for vector similarity search; forwarded to both stores.

None
metric str

The distance metric for vector search. Options: "cosine", "l2sq", "ip" (default: "cosine").

'cosine'
wipe_on_start bool

Whether to clear the database on initialization (default: False).

False
name str

Optional name for the knowledge base (used for serialization and as the filename stem for the default .synalinks paths).

None
encryption_key str

Optional at-rest encryption key for the SQL store. Not forwarded to the graph store (LadybugDB has no encryption-at-rest support).

None
Source code in synalinks/src/knowledge_bases/knowledge_base.py
  22
  23
  24
  25
  26
  27
  28
  29
  30
  31
  32
  33
  34
  35
  36
  37
  38
  39
  40
  41
  42
  43
  44
  45
  46
  47
  48
  49
  50
  51
  52
  53
  54
  55
  56
  57
  58
  59
  60
  61
  62
  63
  64
  65
  66
  67
  68
  69
  70
  71
  72
  73
  74
  75
  76
  77
  78
  79
  80
  81
  82
  83
  84
  85
  86
  87
  88
  89
  90
  91
  92
  93
  94
  95
  96
  97
  98
  99
 100
 101
 102
 103
 104
 105
 106
 107
 108
 109
 110
 111
 112
 113
 114
 115
 116
 117
 118
 119
 120
 121
 122
 123
 124
 125
 126
 127
 128
 129
 130
 131
 132
 133
 134
 135
 136
 137
 138
 139
 140
 141
 142
 143
 144
 145
 146
 147
 148
 149
 150
 151
 152
 153
 154
 155
 156
 157
 158
 159
 160
 161
 162
 163
 164
 165
 166
 167
 168
 169
 170
 171
 172
 173
 174
 175
 176
 177
 178
 179
 180
 181
 182
 183
 184
 185
 186
 187
 188
 189
 190
 191
 192
 193
 194
 195
 196
 197
 198
 199
 200
 201
 202
 203
 204
 205
 206
 207
 208
 209
 210
 211
 212
 213
 214
 215
 216
 217
 218
 219
 220
 221
 222
 223
 224
 225
 226
 227
 228
 229
 230
 231
 232
 233
 234
 235
 236
 237
 238
 239
 240
 241
 242
 243
 244
 245
 246
 247
 248
 249
 250
 251
 252
 253
 254
 255
 256
 257
 258
 259
 260
 261
 262
 263
 264
 265
 266
 267
 268
 269
 270
 271
 272
 273
 274
 275
 276
 277
 278
 279
 280
 281
 282
 283
 284
 285
 286
 287
 288
 289
 290
 291
 292
 293
 294
 295
 296
 297
 298
 299
 300
 301
 302
 303
 304
 305
 306
 307
 308
 309
 310
 311
 312
 313
 314
 315
 316
 317
 318
 319
 320
 321
 322
 323
 324
 325
 326
 327
 328
 329
 330
 331
 332
 333
 334
 335
 336
 337
 338
 339
 340
 341
 342
 343
 344
 345
 346
 347
 348
 349
 350
 351
 352
 353
 354
 355
 356
 357
 358
 359
 360
 361
 362
 363
 364
 365
 366
 367
 368
 369
 370
 371
 372
 373
 374
 375
 376
 377
 378
 379
 380
 381
 382
 383
 384
 385
 386
 387
 388
 389
 390
 391
 392
 393
 394
 395
 396
 397
 398
 399
 400
 401
 402
 403
 404
 405
 406
 407
 408
 409
 410
 411
 412
 413
 414
 415
 416
 417
 418
 419
 420
 421
 422
 423
 424
 425
 426
 427
 428
 429
 430
 431
 432
 433
 434
 435
 436
 437
 438
 439
 440
 441
 442
 443
 444
 445
 446
 447
 448
 449
 450
 451
 452
 453
 454
 455
 456
 457
 458
 459
 460
 461
 462
 463
 464
 465
 466
 467
 468
 469
 470
 471
 472
 473
 474
 475
 476
 477
 478
 479
 480
 481
 482
 483
 484
 485
 486
 487
 488
 489
 490
 491
 492
 493
 494
 495
 496
 497
 498
 499
 500
 501
 502
 503
 504
 505
 506
 507
 508
 509
 510
 511
 512
 513
 514
 515
 516
 517
 518
 519
 520
 521
 522
 523
 524
 525
 526
 527
 528
 529
 530
 531
 532
 533
 534
 535
 536
 537
 538
 539
 540
 541
 542
 543
 544
 545
 546
 547
 548
 549
 550
 551
 552
 553
 554
 555
 556
 557
 558
 559
 560
 561
 562
 563
 564
 565
 566
 567
 568
 569
 570
 571
 572
 573
 574
 575
 576
 577
 578
 579
 580
 581
 582
 583
 584
 585
 586
 587
 588
 589
 590
 591
 592
 593
 594
 595
 596
 597
 598
 599
 600
 601
 602
 603
 604
 605
 606
 607
 608
 609
 610
 611
 612
 613
 614
 615
 616
 617
 618
 619
 620
 621
 622
 623
 624
 625
 626
 627
 628
 629
 630
 631
 632
 633
 634
 635
 636
 637
 638
 639
 640
 641
 642
 643
 644
 645
 646
 647
 648
 649
 650
 651
 652
 653
 654
 655
 656
 657
 658
 659
 660
 661
 662
 663
 664
 665
 666
 667
 668
 669
 670
 671
 672
 673
 674
 675
 676
 677
 678
 679
 680
 681
 682
 683
 684
 685
 686
 687
 688
 689
 690
 691
 692
 693
 694
 695
 696
 697
 698
 699
 700
 701
 702
 703
 704
 705
 706
 707
 708
 709
 710
 711
 712
 713
 714
 715
 716
 717
 718
 719
 720
 721
 722
 723
 724
 725
 726
 727
 728
 729
 730
 731
 732
 733
 734
 735
 736
 737
 738
 739
 740
 741
 742
 743
 744
 745
 746
 747
 748
 749
 750
 751
 752
 753
 754
 755
 756
 757
 758
 759
 760
 761
 762
 763
 764
 765
 766
 767
 768
 769
 770
 771
 772
 773
 774
 775
 776
 777
 778
 779
 780
 781
 782
 783
 784
 785
 786
 787
 788
 789
 790
 791
 792
 793
 794
 795
 796
 797
 798
 799
 800
 801
 802
 803
 804
 805
 806
 807
 808
 809
 810
 811
 812
 813
 814
 815
 816
 817
 818
 819
 820
 821
 822
 823
 824
 825
 826
 827
 828
 829
 830
 831
 832
 833
 834
 835
 836
 837
 838
 839
 840
 841
 842
 843
 844
 845
 846
 847
 848
 849
 850
 851
 852
 853
 854
 855
 856
 857
 858
 859
 860
 861
 862
 863
 864
 865
 866
 867
 868
 869
 870
 871
 872
 873
 874
 875
 876
 877
 878
 879
 880
 881
 882
 883
 884
 885
 886
 887
 888
 889
 890
 891
 892
 893
 894
 895
 896
 897
 898
 899
 900
 901
 902
 903
 904
 905
 906
 907
 908
 909
 910
 911
 912
 913
 914
 915
 916
 917
 918
 919
 920
 921
 922
 923
 924
 925
 926
 927
 928
 929
 930
 931
 932
 933
 934
 935
 936
 937
 938
 939
 940
 941
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
2027
2028
@synalinks_export("synalinks.KnowledgeBase")
class KnowledgeBase(SynalinksSaveable):
    """A knowledge base for storing and retrieving structured data.

    The KnowledgeBase provides a unified interface over two complementary
    stores: a SQL row/table store (DuckDB by default) and a property-graph
    store (LadybugDB by default). The two are orthogonal — SQL methods
    (``update``, ``sql``, ``similarity_search``, ...) route to the SQL
    adapter; graph methods (``update_entities``, ``cypher``,
    ``entity_similarity_search``, ...) route to the graph adapter.

    A no-args ``KnowledgeBase()`` instantiates BOTH stores under
    ``synalinks_home()`` (``database.db`` for SQL, ``database.lb`` for
    the graph) so the two sides are usable side-by-side without setup.
    Pass ``uri=`` alone for SQL-only, ``graph_uri=`` alone for
    graph-only, or both to point each side at a custom location.

    ### Basic Usage

    ```python
    import synalinks

    class Document(synalinks.DataModel):
        id: str
        title: str
        content: str

    # Create a knowledge base without embeddings (full-text search only)
    knowledge_base = synalinks.KnowledgeBase(
        uri="duckdb://my_database.db",
        data_models=[Document],
    )

    # Store a document
    doc = Document(id="1", title="Hello", content="Hello World!")
    await knowledge_base.update(doc.to_json_data_model())

    # Retrieve by ID (the first field, here 'id', is the primary key — see
    # the "Primary Key Convention" section below).
    result = await knowledge_base.get("1", table_name="Document")

    # Full-text search
    results = await knowledge_base.fulltext_search("Hello", k=10)
    ```

    ### Primary Key Convention

    Synalinks does not inject a synthetic ``uuid`` / ``_id`` column. The
    primary key is the **first declared field** of your DataModel, in
    declaration order, after skipping reserved structural fields:

    * For SQL tables (DuckDB): the first property of the schema.
    * For graph entities (Ladybug nodes): the first property after
      ``label``. ``label`` is the node-table name, not a column.
    * For graph relations (Ladybug edges): the first property after
      ``subj`` / ``label`` / ``obj``. Those three are reserved — the
      endpoints are resolved against the node tables, and the label is
      the edge-table name.

    Because the PK is just "whichever field you declared first", a
    KnowledgeBase can be pointed at a pre-existing DuckDB file or
    LadybugDB store without rewriting rows or renaming columns: declare
    your DataModel so its first field matches the column you already
    treat as the identifier (``id``, ``ticker``, ``isbn``, ``email``,
    whatever it happens to be) and the adapters will use it. If you
    *want* a UUID-style key, declare it explicitly as the first field
    and populate it yourself — generating identifiers is the caller's
    job, not the framework's.

    ### With Vector Similarity Search

    ```python
    embedding_model = synalinks.EmbeddingModel(
        model="ollama/mxbai-embed-large"
    )

    knowledge_base = synalinks.KnowledgeBase(
        uri="duckdb://./my_database.db",
        data_models=[Document],
        embedding_model=embedding_model,
        metric="cosine",
    )

    # Hybrid search (combines BM25 fulltext + vector similarity, fused by RRF)
    results = await knowledge_base.hybrid_fts_search("semantic query", k=10)
    ```

    ### Retrieving Table Definitions

    ```python
    # Get all symbolic data models (table definitions) from the database
    symbolic_models = knowledge_base.get_symbolic_data_models()

    for model in symbolic_models:
        print(model.get_schema())
        # {'title': 'Document', 'type': 'object', 'properties': {...}, ...}
    ```

    Args:
        uri (str): SQL store connection URI (``"duckdb://path/to/db.db"``).
            When both ``uri`` and ``graph_uri`` are omitted, defaults to
            ``{synalinks_home()}/{name or 'database'}.db``. Pass ``uri``
            alone to opt out of the graph-side default.
        graph_uri (str): Graph store connection URI
            (``"ladybug://path/to/graph.lb"`` or
            ``"ladybug://:memory:"``). When both URIs are omitted,
            defaults to ``{synalinks_home()}/{name or 'database'}.lb``.
            Pass ``graph_uri`` alone to opt out of the SQL-side default.
        data_models (list): Optional list of DataModel or SymbolicDataModel
            classes to create tables for in the SQL store.
        entity_models (list): Optional list of entity (node) models for
            the graph store.
        relation_models (list): Optional list of relation (edge) models
            for the graph store.
        embedding_model (EmbeddingModel): Optional embedding model for
            vector similarity search; forwarded to both stores.
        metric (str): The distance metric for vector search.
            Options: "cosine", "l2sq", "ip" (default: "cosine").
        wipe_on_start (bool): Whether to clear the database on initialization
            (default: False).
        name (str): Optional name for the knowledge base (used for serialization
            and as the filename stem for the default ``.synalinks`` paths).
        encryption_key (str): Optional at-rest encryption key for the SQL
            store. Not forwarded to the graph store (LadybugDB has no
            encryption-at-rest support).
    """

    def __init__(
        self,
        *,
        uri=None,
        graph_uri=None,
        data_models=None,
        entity_models=None,
        relation_models=None,
        embedding_model=None,
        metric="cosine",
        wipe_on_start=False,
        name=None,
        encryption_key=None,
        **kwargs,
    ):
        # Two adapters can coexist on a single KnowledgeBase:
        #   * `sql_adapter` — row/table store, selected by `uri`
        #     (e.g. duckdb://...). Default backend is DuckDB.
        #   * `graph_adapter` — property-graph store, selected by
        #     `graph_uri` (e.g. ladybug://...). Default backend is
        #     LadybugDB.
        # The two stores are complementary, so a no-args
        # ``KnowledgeBase()`` instantiates BOTH against the same
        # ``synalinks_home()`` directory (``database.db`` for SQL,
        # ``database.lb`` for the graph). Passing only ``uri=`` keeps
        # the call SQL-only; passing only ``graph_uri=`` keeps it
        # graph-only — explicit URIs opt out of auto-pairing so a
        # caller targeting one engine isn't surprised by a second
        # file appearing on disk.
        self.sql_adapter = None
        self.graph_adapter = None

        auto_pair = uri is None and graph_uri is None
        want_sql = uri is not None or auto_pair
        want_graph = graph_uri is not None or auto_pair

        if want_sql:
            self.sql_adapter = database_adapters.get(uri)(
                uri=uri,
                data_models=data_models,
                embedding_model=embedding_model,
                metric=metric,
                wipe_on_start=wipe_on_start,
                name=name,
                encryption_key=encryption_key,
                **kwargs,
            )

        if want_graph:
            # `encryption_key` is intentionally NOT forwarded here:
            # LadybugDB has no encryption-at-rest support. A user that
            # passes it for a dual-adapter KB gets DuckDB encryption
            # for the SQL side and an unencrypted Ladybug graph store
            # (which is the same as if they'd omitted the kwarg).
            self.graph_adapter = graph_database_adapters.get(graph_uri)(
                uri=graph_uri,
                entity_models=entity_models,
                relation_models=relation_models,
                embedding_model=embedding_model,
                metric=metric,
                wipe_on_start=wipe_on_start,
                name=name,
                **kwargs,
            )

        self.uri = uri
        self.graph_uri = graph_uri
        self.data_models = data_models or []
        self.entity_models = entity_models or []
        self.relation_models = relation_models or []
        self.embedding_model = _get_em(embedding_model)
        self.metric = metric
        self.wipe_on_start = wipe_on_start
        if not name:
            self.name = auto_name("knowledge_base")
        else:
            self.name = name
        # `encryption_key` is deliberately NOT stored on `self` — it
        # lives only inside the adapter, and only as long as the
        # adapter does. This keeps the secret out of `get_config()`,
        # off-screen during repr/print, and unreferenced by any
        # serialization path. Callers must re-supply the key when
        # constructing a new KnowledgeBase against an encrypted file.

    async def update(
        self,
        data_model_or_data_models: Union[Any, List[Any], Dataset],
        *,
        verbose="auto",
    ) -> Union[Any, List[Any]]:
        """Insert or update records in the knowledge base.

        Args:
            data_model_or_data_models (JsonDataModel | List[JsonDataModel] | Dataset):
                A single ``JsonDataModel``, a list of ``JsonDataModel`` /
                ``DataModel`` instances, or a synalinks ``Dataset``.
                The ``Dataset`` form streams the source batch-by-batch
                (one ``adapter.update`` call per yielded batch) so memory
                stays bounded for large CSV / Parquet / HuggingFace
                sources. The dataset must be inputs-only — no
                ``output_template`` — because the knowledge base stores
                records, not ``(input, target)`` pairs; pass a
                labeled dataset and you'll get a ``ValueError``.

                Upserts key off the first declared field of the model —
                see the "Primary Key Convention" section on the class
                docstring for how that's resolved (and why no UUID is
                injected).
            verbose (int | str): ``"auto"``, ``0``, ``1``, or ``2``.
                Verbosity for the ``Dataset`` path; matches the
                trainer's ``fit()`` semantics. ``"auto"`` (default)
                resolves to ``1`` when a ``Dataset`` is passed (a
                per-batch progress bar — same widget ``fit()`` uses,
                with ETA when ``len(dataset)`` is known) and is a
                no-op for the scalar / list forms, which finish in a
                single adapter call.

        Returns:
            The primary key value(s) of the inserted/updated records.
            Scalar in / scalar out; list in / list out; ``Dataset`` in /
            flat list of every batch's ids concatenated.
        """
        if isinstance(data_model_or_data_models, Dataset):
            return await self._update_from_dataset(
                data_model_or_data_models, verbose=verbose
            )
        return await self.sql_adapter.update(data_model_or_data_models)

    async def _update_from_dataset(
        self, dataset: Dataset, *, verbose="auto"
    ) -> List[Any]:
        """Stream a ``Dataset`` into the adapter one batch at a time.

        Each batch yielded by the dataset is converted to a list of
        DataModel / JsonDataModel instances and handed to
        ``adapter.update``. The returned ids from every batch are
        accumulated into one flat list — same order as the dataset
        produced them.

        Inputs-only is enforced: a dataset configured with an
        ``output_template`` represents ``(input, target)`` training
        data, which isn't what the knowledge base stores. The check is
        the dataset's public ``output_template`` attribute, not the
        per-batch tuple length — so the rejection happens upfront,
        before any rows are consumed.
        """
        if dataset.output_template is not None:
            raise ValueError(
                "KnowledgeBase.update accepts only inputs-only datasets "
                "(no `output_template`). The knowledge base stores "
                "records, not (input, target) pairs."
            )

        # "auto" → 1 in the Dataset branch (we know there's iteration to
        # display). Outside this branch verbose is dead anyway.
        if verbose == "auto":
            verbose = 1

        progbar = None
        if verbose:
            try:
                target = len(dataset)
            except (TypeError, NotImplementedError):
                target = None
            progbar = Progbar(target=target, verbose=verbose, unit_name="batch")

        ids: List[Any] = []
        step = 0
        for batch in dataset:
            x = batch[0]
            if len(x) == 0:
                continue
            batch_ids = await self.sql_adapter.update(list(x))
            if isinstance(batch_ids, list):
                ids.extend(batch_ids)
            else:
                ids.append(batch_ids)
            step += 1
            if progbar is not None:
                progbar.update(step, values=[("rows", len(ids))])
        if progbar is not None:
            progbar.update(step, values=[("rows", len(ids))], finalize=True)
        return ids

    async def from_csv(
        self,
        path: str,
        *,
        table_name: Optional[str] = None,
        table_description: Optional[str] = None,
        delimiter: str = ",",
        encoding: str = "utf-8",
        header: bool = True,
    ) -> Any:
        """Bulk-load a CSV file directly into the knowledge base.

        Skips the Python row pipeline entirely (no Pydantic, no Jinja,
        no per-row INSERT) and instead delegates to the database's
        native CSV reader. Roughly two orders of magnitude faster than
        ``update(CSVDataset(...))`` for non-trivial files — see
        ``benchmarks/bench_kb_ingest.py``.

        The target table's schema is inferred directly from the
        file's columns, with the first column promoted to PRIMARY
        KEY. The returned `SymbolicDataModel` is the handle
        you pass to subsequent search / get calls — you don't need
        to pre-declare a ``DataModel`` for this table.

        Use the streaming ``update(<...>Dataset(...))`` path instead
        when source rows need transformation before storage (column
        renames, derived fields, HuggingFace datasets, etc.).

        Args:
            path: Path to the CSV file.
            table_name: Target table name. Defaults to the file's stem
                (``/data/my-docs.csv`` → ``MyDocs``). Whatever value
                lands here is always normalized to PascalCase.
            table_description: Optional natural-language description
                attached to the resulting schema.
            delimiter: Field delimiter. Defaults to ``","``.
            encoding: File encoding. Defaults to ``"utf-8"``.
            header: Whether the first row is a header. Defaults to
                ``True``.

        Returns:
            The `SymbolicDataModel` for the loaded table.
        """
        return await self.sql_adapter.from_csv(
            path,
            table_name=table_name,
            table_description=table_description,
            delimiter=delimiter,
            encoding=encoding,
            header=header,
        )

    async def from_parquet(
        self,
        path: str,
        *,
        table_name: Optional[str] = None,
        table_description: Optional[str] = None,
    ) -> Any:
        """Bulk-load a Parquet file directly into the knowledge base.

        Same trade-offs as `from_csv` — bypasses the Python row
        pipeline for native database ingestion. Parquet's schema is
        explicit in the file footer so there is no type-inference
        guesswork to worry about.

        Args:
            path: Path to the Parquet file.
            table_name: Target table name. Defaults to the file's stem
                coerced to PascalCase.
            table_description: Optional schema description.

        Returns:
            The `SymbolicDataModel` for the loaded table.
        """
        return await self.sql_adapter.from_parquet(
            path, table_name=table_name, table_description=table_description
        )

    async def from_json(
        self,
        path: str,
        *,
        table_name: Optional[str] = None,
        table_description: Optional[str] = None,
    ) -> Any:
        """Bulk-load a JSON file (top-level array of objects).

        Same trade-offs as `from_csv` / `from_parquet` —
        bypasses the Python row pipeline. The file must contain a
        top-level JSON array. Use `from_jsonl` for the
        one-object-per-line NDJSON format.

        Args:
            path: Path to the JSON file.
            table_name: Target table name. Defaults to the file's stem
                coerced to PascalCase.
            table_description: Optional schema description.

        Returns:
            The `SymbolicDataModel` for the loaded table.
        """
        return await self.sql_adapter.from_json(
            path, table_name=table_name, table_description=table_description
        )

    async def from_jsonl(
        self,
        path: str,
        *,
        table_name: Optional[str] = None,
        table_description: Optional[str] = None,
    ) -> Any:
        """Bulk-load a JSON Lines (NDJSON) file.

        Same trade-offs as `from_csv` / `from_parquet`,
        and the right call for very large JSON sources that aren't
        a single array.

        Args:
            path: Path to the JSONL file.
            table_name: Target table name. Defaults to the file's stem
                coerced to PascalCase.
            table_description: Optional schema description.

        Returns:
            The `SymbolicDataModel` for the loaded table.
        """
        return await self.sql_adapter.from_jsonl(
            path, table_name=table_name, table_description=table_description
        )

    async def rename(
        self,
        source: Any,
        *,
        table_name: Optional[str] = None,
        table_description: Optional[str] = None,
    ) -> Any:
        """Rename a table and/or update its description.

        Pass at least one of ``table_name`` / ``table_description``.
        When ``table_name`` is given the underlying table is
        renamed via ``ALTER TABLE …``, the FTS / vector indexes are
        rebuilt under the new name, and the adapter's known-models
        list is updated so subsequent default-table searches find
        the table under its new identity.

        Args:
            source: ``SymbolicDataModel`` or table-name string for
                the table to rename. The string form is itself
                PascalCase-normalized, so callers can pass the
                same input they used in `from_csv` (e.g.
                ``"my-docs"``).
            table_name: New table name. Always normalized to
                PascalCase.
            table_description: Optional natural-language description
                attached to the resulting schema.

        Returns:
            A fresh `SymbolicDataModel` for the (possibly
            renamed) table.
        """
        return await self.sql_adapter.rename(
            source,
            table_name=table_name,
            table_description=table_description,
        )

    async def get(
        self,
        id_or_ids: Union[Any, List[Any]],
        *,
        table_name: str,
    ) -> Union[Optional[Any], List[Optional[Any]]]:
        """Retrieve one or more records by primary key from a single table.

        Args:
            id_or_ids: A single primary key value, or a list of values.
            table_name: Target table.

        Returns:
            A single JsonDataModel (or ``None``) when called with one id;
            a list of JsonDataModels (with ``None`` in the slots that did
            not match) when called with a list.
        """
        return await self.sql_adapter.get(id_or_ids, table_name=table_name)

    async def getall(
        self,
        *,
        table_name: str,
        limit: int = 50,
        offset: int = 0,
    ) -> List[Any]:
        """Retrieve all records from a table with pagination.

        Args:
            table_name: Target table.
            limit: Maximum number of records to return (default: 50).
            offset: Number of records to skip (default: 0).

        Returns:
            List of JsonDataModels.
        """
        return await self.sql_adapter.getall(
            table_name=table_name, limit=limit, offset=offset
        )

    async def delete(
        self,
        id_or_ids: Union[Any, List[Any]],
        *,
        table_name: str,
    ) -> int:
        """Delete records by primary key from a single table.

        Pass a single id or a list. The FTS / vector indexes for the
        table are rebuilt afterwards so subsequent search calls
        don't return ghost rows.

        Args:
            id_or_ids: Primary key value, or a list of values.
            table_name: Target table.

        Returns:
            The number of rows actually deleted (0 if no id matched).
        """
        return await self.sql_adapter.delete(id_or_ids, table_name=table_name)

    async def drop_table(self, table_name: str) -> bool:
        """Drop a table from the knowledge base.

        Removes the table's rows, FTS index, and HNSW vector index,
        then drops the table itself. Also forgets the table in the
        adapter's known-models list.

        Args:
            table_name: Target table.

        Returns:
            ``True`` if a table was dropped, ``False`` if it didn't
            exist to begin with.
        """
        return await self.sql_adapter.drop_table(table_name)

    async def sql(
        self,
        sql: str,
        *,
        params: Optional[Dict[str, Any]] = None,
        output_format: str = "json",
        **kwargs,
    ) -> Union[List[Dict[str, Any]], str]:
        """Execute a raw SQL query against the knowledge base.

        Counterpart of `cypher` — the method is named after the
        query language so a dual-adapter KnowledgeBase has a clear
        per-language entry point.

        Args:
            sql (str): The SQL string to execute.
            params (dict): Optional list of parameters for parameterized queries.
            output_format: ``"json"`` (default, list of dicts —
                JSON-shaped Python data) or ``"csv"`` (CSV string,
                useful when handing the result to an LM).
            **kwargs (Any): Additional options. The most important one is
                ``read_only=True/False``. When ``True`` (the DuckDB adapter's
                default) two layers of defence apply:

                1. The SQL is parsed with the engine's own parser and any
                   non-``SELECT`` statement is rejected. This catches
                   multi-statement injection (e.g. ``SELECT 1; DROP TABLE x``),
                   ``COPY ... TO 'file'`` exfiltration, ``ATTACH``, ``EXPORT``,
                   and other side-effecting statements. This is the only
                   layer that blocks writes — the adapter's underlying
                   connection is read-write (one connection per adapter,
                   reused across operations), so the parser check is what
                   keeps untrusted SQL read-only.
                2. ``enable_external_access`` is disabled on that connection
                   at construction time, so ``SELECT`` table functions that
                   touch the host filesystem or network — ``read_csv``,
                   ``read_parquet``, ``read_json``, ``read_blob``,
                   ``read_text``, ``glob`` and the httpfs/S3 variants —
                   return a permission error instead of leaking files.
                   Without this layer,
                   ``SELECT * FROM read_csv('/etc/passwd', ...)`` would pass
                   defence (1) because it is a syntactically valid ``SELECT``.

                Pass ``read_only=False`` only from trusted call sites that
                genuinely need to mutate state. Those paths still run on
                the same sandboxed connection (no external I/O), but they
                bypass the parser check, so any SQL is accepted — keep them
                out of the LM-tool-call surface.

        Returns:
            (Union[List[Dict[str, Any]], str]): A list of dicts when
                ``output_format="json"``, or a CSV string when
                ``output_format="csv"``.
        """
        return await self.sql_adapter.sql(
            sql, params=params, output_format=output_format, **kwargs
        )

    async def similarity_search(
        self,
        text_or_texts: Union[str, List[str]],
        *,
        table_name: str,
        k: int = 10,
        threshold: Optional[float] = None,
        ef_search: Optional[int] = None,
        output_format: str = "json",
    ):
        """Vector similarity search against a single table.

        Args:
            text_or_texts: Query text or list of query texts.
            table_name: Target table (single-table search).
            k: Maximum number of results to return.
            threshold: Optional maximum vector-distance threshold.
            ef_search: HNSW search-time candidate-list depth.
                ``None`` keeps the index-time value (or the engine
                default). Higher = better recall, slower query.
            output_format: ``"json"`` (default, list of dicts —
                JSON-shaped Python data) or ``"csv"`` (CSV string,
                useful for handing results to an LM since CSV is
                ~30-50% fewer tokens than equivalent JSON).
        """
        return await self.sql_adapter.similarity_search(
            text_or_texts,
            table_name=table_name,
            k=k,
            threshold=threshold,
            ef_search=ef_search,
            output_format=output_format,
        )

    async def fulltext_search(
        self,
        text_or_texts: Union[str, List[str]],
        *,
        table_name: str,
        k: int = 10,
        threshold: Optional[float] = None,
        conjunctive: bool = False,
        bm25_b: Optional[float] = None,
        bm25_k: Optional[float] = None,
        output_format: str = "json",
    ):
        """BM25 full-text search against a single table.

        Args:
            text_or_texts: Query text or list of query texts.
            table_name: Target table.
            k: Maximum number of results.
            threshold: Optional minimum BM25 score.
            conjunctive: AND-mode query (every term must match).
                Default ``False`` keeps OR semantics.
            bm25_b: Optional override for BM25's ``b`` parameter
                (document-length normalization).
            bm25_k: Optional override for BM25's ``k1`` parameter
                (term-frequency saturation).
            output_format: ``"json"`` (list of dicts, default) / ``"csv"`` (text).
        """
        return await self.sql_adapter.fulltext_search(
            text_or_texts,
            table_name=table_name,
            k=k,
            threshold=threshold,
            conjunctive=conjunctive,
            bm25_b=bm25_b,
            bm25_k=bm25_k,
            output_format=output_format,
        )

    async def regex_search(
        self,
        pattern: str,
        *,
        table_name: str,
        fields: Optional[List[str]] = None,
        case_sensitive: bool = True,
        k: int = 10,
        output_format: str = "json",
    ):
        """Find rows whose string fields match a regular expression.

        DuckDB evaluates regexes with RE2, so patterns are linear-time
        and not vulnerable to catastrophic backtracking.

        Args:
            pattern: The regex pattern (RE2 syntax).
            table_name: Target table.
            fields: Field names to match against. Defaults to every
                string field on the schema. Names are snake_case-
                normalized to match stored column names.
            case_sensitive: When ``False``, match case-insensitively.
            k: Maximum number of results.
            output_format: ``"json"`` (list of dicts, default) / ``"csv"`` (text).
        """
        return await self.sql_adapter.regex_search(
            pattern,
            table_name=table_name,
            fields=fields,
            case_sensitive=case_sensitive,
            k=k,
            output_format=output_format,
        )

    async def hybrid_fts_search(
        self,
        text_or_texts: Union[str, List[str]],
        *,
        keywords: Optional[Union[str, List[str]]] = None,
        table_name: str,
        k: int = 10,
        k_rank: int = 60,
        similarity_threshold: Optional[float] = None,
        fulltext_threshold: Optional[float] = None,
        ef_search: Optional[int] = None,
        conjunctive: bool = False,
        bm25_b: Optional[float] = None,
        bm25_k: Optional[float] = None,
        output_format: str = "json",
    ):
        """Reciprocal-Rank-Fusion of vector similarity + BM25 fulltext.

        Falls back to full-text-only when no embedding model is
        configured. The regex-side sibling is
        `hybrid_regex_search`.

        Args:
            text_or_texts: Query text or list of query texts.
            table_name: Target table.
            k: Maximum results.
            k_rank: RRF smoothing constant. Lower emphasizes top
                ranks more strongly (default: 60).
            similarity_threshold: Optional vector-distance threshold.
            fulltext_threshold: Optional BM25 threshold.
            ef_search: Forwarded to the vector branch; HNSW
                search-time candidate-list depth.
            conjunctive: Forwarded to the BM25 branch; AND-mode query.
            bm25_b: Forwarded to the BM25 branch; document-length
                normalization override.
            bm25_k: Forwarded to the BM25 branch; term-frequency
                saturation override.
            output_format: ``"json"`` (list of dicts, default) / ``"csv"`` (text).
        """
        return await self.sql_adapter.hybrid_fts_search(
            text_or_texts=text_or_texts,
            table_name=table_name,
            keywords=keywords,
            k=k,
            k_rank=k_rank,
            similarity_threshold=similarity_threshold,
            fulltext_threshold=fulltext_threshold,
            ef_search=ef_search,
            conjunctive=conjunctive,
            bm25_b=bm25_b,
            bm25_k=bm25_k,
            output_format=output_format,
        )

    async def hybrid_search(self, *args, **kwargs):
        """Deprecated alias of `hybrid_fts_search`.

        Kept for backwards compatibility. The new name is symmetric
        with `hybrid_regex_search`; prefer it in new code.
        """
        return await self.hybrid_fts_search(*args, **kwargs)

    async def hybrid_regex_search(
        self,
        text_or_texts: Union[str, List[str]],
        *,
        pattern_or_patterns: Union[str, List[str], None] = None,
        table_name: str,
        k: int = 10,
        k_rank: int = 60,
        similarity_threshold: Optional[float] = None,
        ef_search: Optional[int] = None,
        fields: Optional[List[str]] = None,
        case_sensitive: bool = True,
        output_format: str = "json",
    ):
        """Reciprocal-Rank-Fusion of vector similarity + regex.

        The regex-side counterpart to `hybrid_fts_search` (which
        pairs vector with BM25 fulltext). The two signals are
        orthogonal: vectors capture semantic similarity, regex
        captures exact textual shape. Ranks are fused with the same
        RRF formula.

        Args:
            text_or_texts: Natural-language query (or list) for the
                vector side.
            pattern_or_patterns: RE2 pattern (or list) for the regex
                side. ``None`` falls back to plain similarity search.
            table_name: Target table.
            k: Maximum results.
            k_rank: RRF smoothing constant.
            similarity_threshold: Vector-distance threshold.
            ef_search: Forwarded to the vector branch; HNSW
                search-time candidate-list depth.
            fields: Forwarded to the regex side.
            case_sensitive: Forwarded to the regex side.
            output_format: ``"json"`` (list of dicts, default) / ``"csv"`` (text).
        """
        return await self.sql_adapter.hybrid_regex_search(
            text_or_texts=text_or_texts,
            pattern_or_patterns=pattern_or_patterns,
            table_name=table_name,
            k=k,
            k_rank=k_rank,
            similarity_threshold=similarity_threshold,
            ef_search=ef_search,
            fields=fields,
            case_sensitive=case_sensitive,
            output_format=output_format,
        )

    # ---------------------------------------------------------------------
    # Graph store API — orthogonal to the SQL store above.
    #
    # These methods require the underlying adapter to be a
    # ``GraphDatabaseAdapter`` (selected by the URI scheme, e.g.
    # ``ladybug://``). Calling them on a SQL-only KnowledgeBase raises
    # ``NotImplementedError`` with a clear message instead of an opaque
    # ``AttributeError``.
    # ---------------------------------------------------------------------

    def _require_graph_adapter(self) -> None:
        """Raise if no graph adapter is attached to this KnowledgeBase.

        The graph adapter is set up only when ``graph_uri`` is passed
        at construction time; calling a graph method on a SQL-only KB
        must fail with a clear message instead of an ``AttributeError``
        from accessing ``None``.
        """
        if not isinstance(self.graph_adapter, GraphDatabaseAdapter):
            raise NotImplementedError(
                "Graph operations require a graph database adapter "
                "(pass graph_uri='ladybug://...' at construction time)."
            )

    async def update_entities(
        self,
        entity_or_entities: Union[Any, List[Any]],
    ) -> Union[Any, List[Any]]:
        """Insert or update one or more entities (nodes) in the graph.

        Graph-side counterpart of the SQL `update`. The name
        mirrors the `Entities` data model; pass either a single
        ``Entity`` or a list — the return shape matches the input.

        Args:
            entity_or_entities: An ``Entity`` instance, or a list of
                them (or anything satisfying ``is_entity``).

        Returns:
            The node id(s) assigned by the backend. Scalar in / scalar
            out; list in / list out.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.update_entities(entity_or_entities)

    async def update_relations(
        self,
        relation_or_relations: Union[Any, List[Any]],
    ) -> Union[Any, List[Any]]:
        """Insert or update one or more relations (edges) in the graph.

        Mirrors the `Relations` data model. Each relation's
        ``subj`` and ``obj`` are upserted as needed so every edge has
        both endpoints.

        Args:
            relation_or_relations: A ``Relation`` instance, or a list
                of them (or anything satisfying ``is_relation``).

        Returns:
            The edge id(s) assigned by the backend. Scalar in / scalar
            out; list in / list out.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.update_relations(relation_or_relations)

    async def update_knowledge_graph(self, knowledge_graph: Any) -> Any:
        """Bulk-insert a full knowledge graph (entities + relations).

        Equivalent to calling `update_entities` then
        `update_relations`, but concrete adapters may optimize
        the combined path.

        Args:
            knowledge_graph: A ``KnowledgeGraph`` instance.

        Returns:
            A dict with ``{"entities": [...ids...], "relations":
            [...ids...]}``.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.update_knowledge_graph(knowledge_graph)

    async def get_entity(
        self,
        id_or_ids: Union[Any, List[Any]],
        *,
        label: str,
    ) -> Union[Optional[Any], List[Optional[Any]]]:
        """Retrieve one or more entities by primary key from a label.

        Args:
            id_or_ids: A single primary key value, or a list of values.
            label: The entity label (node type).

        Returns:
            A single ``JsonDataModel`` (or ``None``) for a scalar
            argument; a list (with ``None`` for misses) for a list
            argument.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.get_entity(id_or_ids, label=label)

    async def delete_entity(
        self,
        id_or_ids: Union[Any, List[Any]],
        *,
        label: str,
    ) -> int:
        """Delete entities by primary key from a label.

        Incident relations are removed by the adapter.

        Args:
            id_or_ids: Primary key value, or a list of values.
            label: The entity label.

        Returns:
            The number of entities actually deleted.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.delete_entity(id_or_ids, label=label)

    async def delete_relation(
        self,
        *,
        label: str,
        source_id: Any,
        target_id: Any,
    ) -> int:
        """Delete a relation between two entities.

        Args:
            label: The relation label.
            source_id: The subject (source) entity's primary key.
            target_id: The object (target) entity's primary key.

        Returns:
            The number of edges actually deleted.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.delete_relation(
            label=label, source_id=source_id, target_id=target_id
        )

    async def cypher(
        self,
        query: str,
        *,
        params: Optional[Dict[str, Any]] = None,
        output_format: str = "json",
        **kwargs: Any,
    ) -> Union[List[Dict[str, Any]], str]:
        """Execute a raw Cypher query against the graph.

        The graph-store counterpart to `query` (which executes
        SQL). Kept under a distinct name to avoid ambiguity when the
        KnowledgeBase grows both surfaces.

        Args:
            query: The Cypher query string.
            params: Optional parameters for parameterized queries.
            output_format: ``"json"`` (default) or ``"csv"``.
            **kwargs: Adapter-specific options (e.g. ``read_only``).

        Returns:
            A list of dicts when ``output_format="json"``, or a CSV
            string when ``output_format="csv"``.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.cypher(
            query, params=params, output_format=output_format, **kwargs
        )

    async def entity_similarity_search(
        self,
        text_or_texts: Union[str, List[str]],
        *,
        label: str,
        k: int = 10,
        threshold: Optional[float] = None,
        ef_search: Optional[int] = None,
        output_format: str = "json",
    ):
        """Vector similarity search over entities of a given label.

        Args:
            text_or_texts: Query text or list of query texts.
            label: The entity label to search within.
            k: Maximum number of results.
            threshold: Optional vector-distance threshold.
            ef_search: Engine-specific search-time recall knob (HNSW
                ``efs``). Higher = better recall but slower.
            output_format: ``"json"`` (default) or ``"csv"``.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.entity_similarity_search(
            text_or_texts,
            label=label,
            k=k,
            threshold=threshold,
            ef_search=ef_search,
            output_format=output_format,
        )

    async def entity_fulltext_search(
        self,
        text_or_texts: Union[str, List[str]],
        *,
        label: str,
        k: int = 10,
        threshold: Optional[float] = None,
        conjunctive: bool = False,
        bm25_b: Optional[float] = None,
        output_format: str = "json",
    ):
        """BM25 full-text search over entities of a given label.

        Args:
            text_or_texts: Query text or list of query texts.
            label: The entity label to search within.
            k: Maximum number of results.
            threshold: Optional minimum BM25 score.
            conjunctive: AND-mode query (every term must match).
            bm25_b: Optional override for BM25's ``b`` parameter.
            output_format: ``"json"`` (default) or ``"csv"``.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.entity_fulltext_search(
            text_or_texts,
            label=label,
            k=k,
            threshold=threshold,
            conjunctive=conjunctive,
            bm25_b=bm25_b,
            output_format=output_format,
        )

    async def entity_regex_search(
        self,
        pattern: str,
        *,
        label: str,
        fields: Optional[List[str]] = None,
        case_sensitive: bool = True,
        k: int = 10,
        output_format: str = "json",
    ):
        """Regex search over entities of a label.

        Graph-side counterpart of `regex_search`. Applies the
        pattern to every indexed string field on the entity (or to
        the caller-supplied subset via ``fields``) and returns rows
        whose any matching field hits.

        Args:
            pattern: The regex pattern.
            label: The entity label to search within.
            fields: Optional whitelist of fields.
            case_sensitive: When ``False``, matches case-insensitively.
            k: Maximum number of rows.
            output_format: ``"json"`` (default) or ``"csv"``.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.entity_regex_search(
            pattern,
            label=label,
            fields=fields,
            case_sensitive=case_sensitive,
            k=k,
            output_format=output_format,
        )

    async def entity_hybrid_regex_search(
        self,
        text_or_texts: Union[str, List[str]],
        *,
        pattern_or_patterns: Optional[Union[str, List[str]]] = None,
        label: str,
        fields: Optional[List[str]] = None,
        case_sensitive: bool = True,
        k: int = 10,
        k_rank: int = 60,
        similarity_threshold: Optional[float] = None,
        output_format: str = "json",
    ):
        """RRF fusion of vector similarity + regex match over entities.

        Sibling of `entity_hybrid_fts_search`. Falls through
        to `entity_similarity_search` when no patterns are
        supplied; falls through to `entity_regex_search` when
        no embedding model is configured.

        Args:
            text_or_texts: Query text or list of query texts for the
                vector branch.
            pattern_or_patterns: Regex pattern (or list) for the
                regex branch. ``None`` skips the regex side.
            label: The entity label.
            fields: Forwarded to `entity_regex_search`.
            case_sensitive: Forwarded to `entity_regex_search`.
            k: Maximum number of results.
            k_rank: RRF smoothing constant.
            similarity_threshold: Optional vector-distance threshold.
            output_format: ``"json"`` (default) or ``"csv"``.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.entity_hybrid_regex_search(
            text_or_texts=text_or_texts,
            pattern_or_patterns=pattern_or_patterns,
            label=label,
            fields=fields,
            case_sensitive=case_sensitive,
            k=k,
            k_rank=k_rank,
            similarity_threshold=similarity_threshold,
            output_format=output_format,
        )

    async def entity_hybrid_fts_search(
        self,
        text_or_texts: Union[str, List[str]],
        *,
        keywords: Optional[Union[str, List[str]]] = None,
        label: str,
        k: int = 10,
        k_rank: int = 60,
        similarity_threshold: Optional[float] = None,
        fulltext_threshold: Optional[float] = None,
        ef_search: Optional[int] = None,
        conjunctive: bool = False,
        bm25_b: Optional[float] = None,
        output_format: str = "json",
    ):
        """RRF of vector similarity + BM25 fulltext over entities of a label.

        Graph-side counterpart of `hybrid_fts_search`.

        Args:
            text_or_texts: Query text or list of query texts.
            label: The entity label to search within.
            k: Maximum number of results.
            k_rank: RRF smoothing constant.
            similarity_threshold: Optional vector-distance threshold.
            fulltext_threshold: Optional BM25 threshold.
            ef_search: HNSW ``efs`` knob for the vector branch.
            conjunctive: AND vs OR for the BM25 branch.
            bm25_b: Optional override for BM25's ``b`` parameter.
            output_format: ``"json"`` (default) or ``"csv"``.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.entity_hybrid_fts_search(
            text_or_texts=text_or_texts,
            label=label,
            keywords=keywords,
            k=k,
            k_rank=k_rank,
            similarity_threshold=similarity_threshold,
            fulltext_threshold=fulltext_threshold,
            ef_search=ef_search,
            conjunctive=conjunctive,
            bm25_b=bm25_b,
            output_format=output_format,
        )

    async def relation_similarity_search(
        self,
        text_or_texts: Union[str, List[str]],
        *,
        label: str,
        k: int = 10,
        threshold: Optional[float] = None,
        ef_search: Optional[int] = None,
        output_format: str = "json",
    ):
        """Vector similarity search over relations of a given label.

        The query text matches against BOTH endpoints (subject and
        object); the adapter returns one row per matched edge with
        its best (lowest) distance and a ``matched_on`` tag
        (``"subj"``, ``"obj"``, or ``"both"``).

        Args:
            text_or_texts: Query text or list of query texts.
            label: The relation label to search within.
            k: Maximum number of results.
            threshold: Optional vector-distance threshold per endpoint.
            ef_search: HNSW ``efs`` knob applied to both endpoint
                vector searches.
            output_format: ``"json"`` (default) or ``"csv"``.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.relation_similarity_search(
            text_or_texts,
            label=label,
            k=k,
            threshold=threshold,
            ef_search=ef_search,
            output_format=output_format,
        )

    async def relation_fulltext_search(
        self,
        text_or_texts: Union[str, List[str]],
        *,
        label: str,
        k: int = 10,
        threshold: Optional[float] = None,
        conjunctive: bool = False,
        bm25_b: Optional[float] = None,
        output_format: str = "json",
    ):
        """BM25 fulltext search over relations of a given label.

        Per matched edge, the final ``score`` is the sum of the
        subject-side and object-side BM25 scores — either-endpoint
        union (edge surfaces if either endpoint matched).

        Args:
            text_or_texts: Query text or list of query texts.
            label: The relation label to search within.
            k: Maximum number of results.
            threshold: Optional minimum BM25 threshold applied per endpoint.
            conjunctive: AND-mode query (every term must match).
            bm25_b: Optional override for BM25's ``b`` parameter.
            output_format: ``"json"`` (default) or ``"csv"``.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.relation_fulltext_search(
            text_or_texts,
            label=label,
            k=k,
            threshold=threshold,
            conjunctive=conjunctive,
            bm25_b=bm25_b,
            output_format=output_format,
        )

    async def relation_regex_search(
        self,
        pattern: str,
        *,
        label: str,
        fields: Optional[List[str]] = None,
        case_sensitive: bool = True,
        k: int = 10,
        output_format: str = "json",
    ):
        """Regex search over relations of a given label.

        Composed via `entity_regex_search` on each endpoint.
        Regex hits are binary; the row's ``score`` is 2.0 when both
        endpoints matched and 1.0 when only one did, with
        ``matched_on`` indicating the side(s).

        Args:
            pattern: The regex pattern.
            label: The relation label to search within.
            fields: Optional whitelist of fields, applied to both endpoints.
            case_sensitive: When ``False``, matches case-insensitively.
            k: Maximum number of rows.
            output_format: ``"json"`` (default) or ``"csv"``.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.relation_regex_search(
            pattern,
            label=label,
            fields=fields,
            case_sensitive=case_sensitive,
            k=k,
            output_format=output_format,
        )

    async def relation_hybrid_regex_search(
        self,
        text_or_texts: Union[str, List[str]],
        *,
        pattern_or_patterns: Optional[Union[str, List[str]]] = None,
        label: str,
        fields: Optional[List[str]] = None,
        case_sensitive: bool = True,
        k: int = 10,
        k_rank: int = 60,
        similarity_threshold: Optional[float] = None,
        output_format: str = "json",
    ):
        """RRF of vector similarity + regex match over relations.

        Per matched edge, the final ``rrf_score`` is the sum of the
        subject's and the object's hybrid scores — same 4-source-RRF
        reduction as `relation_hybrid_fts_search`. Falls through
        to `relation_similarity_search` when no patterns are
        supplied.

        Args:
            text_or_texts: Query text or list of query texts for the vector branch.
            pattern_or_patterns: Regex pattern (or list) for the regex branch.
            label: The relation label.
            fields: Forwarded to `entity_regex_search`.
            case_sensitive: Forwarded to `entity_regex_search`.
            k: Maximum number of results.
            k_rank: RRF smoothing constant.
            similarity_threshold: Optional vector-distance threshold.
            output_format: ``"json"`` (default) or ``"csv"``.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.relation_hybrid_regex_search(
            text_or_texts=text_or_texts,
            pattern_or_patterns=pattern_or_patterns,
            label=label,
            fields=fields,
            case_sensitive=case_sensitive,
            k=k,
            k_rank=k_rank,
            similarity_threshold=similarity_threshold,
            output_format=output_format,
        )

    async def relation_hybrid_fts_search(
        self,
        text_or_texts: Union[str, List[str]],
        *,
        keywords: Optional[Union[str, List[str]]] = None,
        label: str,
        k: int = 10,
        k_rank: int = 60,
        similarity_threshold: Optional[float] = None,
        fulltext_threshold: Optional[float] = None,
        ef_search: Optional[int] = None,
        conjunctive: bool = False,
        bm25_b: Optional[float] = None,
        output_format: str = "json",
    ):
        """RRF of vector + BM25 fulltext over relations of a label.

        Either-endpoint union: per matched edge, the final
        ``rrf_score`` is the sum of the subject-side and
        object-side hybrid scores — equivalent to a 4-source RRF.
        Falls back to fulltext-only when no embedding model is
        configured.

        Args:
            text_or_texts: Query text or list of query texts.
            label: The relation label to search within.
            k: Maximum number of results.
            k_rank: RRF smoothing constant.
            similarity_threshold: Optional vector-distance threshold.
            fulltext_threshold: Optional BM25 score threshold.
            ef_search: HNSW ``efs`` knob for the vector branch.
            conjunctive: AND vs OR for the BM25 branch.
            bm25_b: Optional override for BM25's ``b`` parameter.
            output_format: ``"json"`` (default) or ``"csv"``.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.relation_hybrid_fts_search(
            text_or_texts=text_or_texts,
            label=label,
            keywords=keywords,
            k=k,
            k_rank=k_rank,
            similarity_threshold=similarity_threshold,
            fulltext_threshold=fulltext_threshold,
            ef_search=ef_search,
            conjunctive=conjunctive,
            bm25_b=bm25_b,
            output_format=output_format,
        )

    async def path_hybrid_fts_search(
        self,
        subj_text_or_texts: Union[str, List[str]],
        obj_text_or_texts: Union[str, List[str]],
        *,
        subj_keywords: Optional[Union[str, List[str]]] = None,
        obj_keywords: Optional[Union[str, List[str]]] = None,
        subj_label: str,
        obj_label: str,
        label: Optional[str] = None,
        min_hops: int = 1,
        max_hops: int = 3,
        k: int = 10,
        k_rank: int = 60,
        similarity_threshold: Optional[float] = None,
        fulltext_threshold: Optional[float] = None,
        ef_search: Optional[int] = None,
        conjunctive: bool = False,
        bm25_b: Optional[float] = None,
        output_format: str = "json",
    ):
        """Hybrid variable-length path search where BOTH endpoints match.

        AND-semantics. Each side is hybrid-searched (vec + fts)
        independently; per matching path the ``rrf_score`` is the
        sum of the subject-side and object-side hybrid scores.
        Falls back to fulltext-only when no embedding model is
        configured.

        Args:
            subj_text_or_texts: Query text (or list) for the subject.
            obj_text_or_texts: Query text (or list) for the object.
            subj_label: Entity label of the subject endpoint.
            obj_label: Entity label of the object endpoint.
            label: Optional rel-label constraint for every hop.
            min_hops: Minimum hop count, inclusive (default: 1).
            max_hops: Maximum hop count, inclusive (default: 3).
            k: Maximum number of results.
            k_rank: RRF smoothing constant.
            similarity_threshold: Optional vector-distance threshold.
            fulltext_threshold: Optional BM25 score threshold.
            ef_search: HNSW ``efs`` knob applied to both endpoints.
            conjunctive: AND vs OR for the BM25 branch.
            bm25_b: Optional override for BM25's ``b`` parameter.
            output_format: ``"json"`` (default) or ``"csv"``.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.path_hybrid_fts_search(
            subj_text_or_texts=subj_text_or_texts,
            obj_text_or_texts=obj_text_or_texts,
            subj_label=subj_label,
            obj_label=obj_label,
            subj_keywords=subj_keywords,
            obj_keywords=obj_keywords,
            label=label,
            min_hops=min_hops,
            max_hops=max_hops,
            k=k,
            k_rank=k_rank,
            similarity_threshold=similarity_threshold,
            fulltext_threshold=fulltext_threshold,
            ef_search=ef_search,
            conjunctive=conjunctive,
            bm25_b=bm25_b,
            output_format=output_format,
        )

    async def path_similarity_search(
        self,
        subj_text_or_texts: Union[str, List[str]],
        obj_text_or_texts: Union[str, List[str]],
        *,
        subj_label: str,
        obj_label: str,
        label: Optional[str] = None,
        min_hops: int = 1,
        max_hops: int = 3,
        k: int = 10,
        subj_threshold: Optional[float] = None,
        obj_threshold: Optional[float] = None,
        ef_search: Optional[int] = None,
        output_format: str = "json",
    ):
        """Variable-length path search where BOTH endpoints match.

        Returns paths of ``min_hops..max_hops`` edges whose start
        node is vector-close to ``subj_text_or_texts`` AND whose
        end node is vector-close to ``obj_text_or_texts``. ``label``
        is an optional rel-label constraint applied to every hop;
        when omitted, any edge type is allowed.

        Each row carries the full path: ``nodes`` (every node along
        the way, endpoints included), ``rels`` (every edge), and
        ``length`` (hop count), alongside the two endpoint distances
        and flattened endpoint PKs.

        Args:
            subj_text_or_texts: Query text (or list) for the subject.
            obj_text_or_texts: Query text (or list) for the object.
            subj_label: Entity label of the subject endpoint.
            obj_label: Entity label of the object endpoint.
            label: Optional rel-label constraint for every hop.
            min_hops: Minimum hop count, inclusive (default: 1).
            max_hops: Maximum hop count, inclusive (default: 3).
            k: Maximum number of results.
            subj_threshold: Optional subject-side distance threshold.
            obj_threshold: Optional object-side distance threshold.
            output_format: ``"json"`` (default) or ``"csv"``.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.path_similarity_search(
            subj_text_or_texts,
            obj_text_or_texts,
            subj_label=subj_label,
            obj_label=obj_label,
            label=label,
            min_hops=min_hops,
            max_hops=max_hops,
            k=k,
            subj_threshold=subj_threshold,
            obj_threshold=obj_threshold,
            ef_search=ef_search,
            output_format=output_format,
        )

    async def path_fulltext_search(
        self,
        subj_text_or_texts: Union[str, List[str]],
        obj_text_or_texts: Union[str, List[str]],
        *,
        subj_label: str,
        obj_label: str,
        label: Optional[str] = None,
        min_hops: int = 1,
        max_hops: int = 3,
        k: int = 10,
        threshold: Optional[float] = None,
        conjunctive: bool = False,
        bm25_b: Optional[float] = None,
        output_format: str = "json",
    ):
        """BM25 variable-length path search, AND semantics.

        Same shape as `path_similarity_search` but driven by BM25
        fulltext on each endpoint. Per matched path, ``score`` is the
        sum of the subject-side and object-side BM25 scores.

        Args:
            subj_text_or_texts: Keyword query (or list) for the subject.
            obj_text_or_texts: Keyword query (or list) for the object.
            subj_label: Entity label of the subject endpoint.
            obj_label: Entity label of the object endpoint.
            label: Optional rel-label constraint for every hop.
            min_hops: Minimum hop count, inclusive (default: 1).
            max_hops: Maximum hop count, inclusive (default: 3).
            k: Maximum number of results.
            threshold: Optional minimum BM25 threshold per endpoint.
            conjunctive: AND-mode BM25 query.
            bm25_b: Optional override for BM25's ``b`` parameter.
            output_format: ``"json"`` (default) or ``"csv"``.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.path_fulltext_search(
            subj_text_or_texts=subj_text_or_texts,
            obj_text_or_texts=obj_text_or_texts,
            subj_label=subj_label,
            obj_label=obj_label,
            label=label,
            min_hops=min_hops,
            max_hops=max_hops,
            k=k,
            threshold=threshold,
            conjunctive=conjunctive,
            bm25_b=bm25_b,
            output_format=output_format,
        )

    async def path_regex_search(
        self,
        subj_pattern: str,
        obj_pattern: str,
        *,
        subj_label: str,
        obj_label: str,
        label: Optional[str] = None,
        min_hops: int = 1,
        max_hops: int = 3,
        k: int = 10,
        fields: Optional[List[str]] = None,
        case_sensitive: bool = True,
        output_format: str = "json",
    ):
        """Regex variable-length path search, AND semantics.

        Both endpoints must match their respective regex pattern.
        Regex is binary; ranking is by path length (shorter first).

        Args:
            subj_pattern: Regex pattern for the subject endpoint.
            obj_pattern: Regex pattern for the object endpoint.
            subj_label: Entity label of the subject endpoint.
            obj_label: Entity label of the object endpoint.
            label: Optional rel-label constraint for every hop.
            min_hops: Minimum hop count, inclusive (default: 1).
            max_hops: Maximum hop count, inclusive (default: 3).
            k: Maximum number of results.
            fields: Optional whitelist of fields, applied to both endpoints.
            case_sensitive: When ``False``, matches case-insensitively.
            output_format: ``"json"`` (default) or ``"csv"``.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.path_regex_search(
            subj_pattern=subj_pattern,
            obj_pattern=obj_pattern,
            subj_label=subj_label,
            obj_label=obj_label,
            label=label,
            min_hops=min_hops,
            max_hops=max_hops,
            k=k,
            fields=fields,
            case_sensitive=case_sensitive,
            output_format=output_format,
        )

    async def path_hybrid_regex_search(
        self,
        subj_text_or_texts: Union[str, List[str]],
        obj_text_or_texts: Union[str, List[str]],
        *,
        subj_pattern_or_patterns: Optional[Union[str, List[str]]] = None,
        obj_pattern_or_patterns: Optional[Union[str, List[str]]] = None,
        subj_label: str,
        obj_label: str,
        label: Optional[str] = None,
        min_hops: int = 1,
        max_hops: int = 3,
        k: int = 10,
        k_rank: int = 60,
        similarity_threshold: Optional[float] = None,
        fields: Optional[List[str]] = None,
        case_sensitive: bool = True,
        output_format: str = "json",
    ):
        """RRF of vector + regex variable-length path search, AND semantics.

        Each side is hybrid-searched (vec + regex) independently; the
        path's ``rrf_score`` is the sum of the two endpoint hybrid
        scores. Falls through to `path_similarity_search` when
        no patterns are supplied.

        Args:
            subj_text_or_texts: Query text (or list) for the subject vector branch.
            obj_text_or_texts: Query text (or list) for the object vector branch.
            subj_pattern_or_patterns: Regex pattern (or list) for the subject.
            obj_pattern_or_patterns: Regex pattern (or list) for the object.
            subj_label: Entity label of the subject endpoint.
            obj_label: Entity label of the object endpoint.
            label: Optional rel-label constraint for every hop.
            min_hops: Minimum hop count, inclusive (default: 1).
            max_hops: Maximum hop count, inclusive (default: 3).
            k: Maximum number of results.
            k_rank: RRF smoothing constant.
            similarity_threshold: Optional vector-distance threshold.
            fields: Forwarded to the regex branch.
            case_sensitive: Forwarded to the regex branch.
            output_format: ``"json"`` (default) or ``"csv"``.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.path_hybrid_regex_search(
            subj_text_or_texts=subj_text_or_texts,
            obj_text_or_texts=obj_text_or_texts,
            subj_pattern_or_patterns=subj_pattern_or_patterns,
            obj_pattern_or_patterns=obj_pattern_or_patterns,
            subj_label=subj_label,
            obj_label=obj_label,
            label=label,
            min_hops=min_hops,
            max_hops=max_hops,
            k=k,
            k_rank=k_rank,
            similarity_threshold=similarity_threshold,
            fields=fields,
            case_sensitive=case_sensitive,
            output_format=output_format,
        )

    def get_symbolic_data_models(self) -> List[Any]:
        """Retrieve all symbolic data models (table definitions) from the database.

        Returns a list of SymbolicDataModel objects representing each table
        in the database. This is useful for introspecting the database schema
        or for passing to search methods to limit the search scope.

        Returns:
            list: List of symbolic data models representing the database tables.

        Example:
            ```python
            symbolic_models = knowledge_base.get_symbolic_data_models()
            for model in symbolic_models:
                schema = model.get_schema()
                print(f"Table: {schema['title']}")
                print(f"Fields: {list(schema['properties'].keys())}")
            ```
        """
        return self.sql_adapter.get_symbolic_data_models()

    def get_symbolic_entities(self) -> List[Any]:
        """Retrieve a ``SymbolicDataModel`` per node label in the graph.

        Graph-side counterpart of `get_symbolic_data_models`,
        split by graph role: returns only entity (node) schemas.
        Each schema carries a ``label`` ``const`` discriminator and
        one property per stored column.

        Returns:
            list[SymbolicDataModel]: one per existing node label.
        """
        self._require_graph_adapter()
        return self.graph_adapter.get_symbolic_entities()

    def get_symbolic_relations(self) -> List[Any]:
        """Retrieve a ``SymbolicDataModel`` per relation label in the graph.

        Each returned schema includes its endpoint node schemas under
        ``$defs`` and references them as ``subj`` / ``obj`` via
        ``$ref`` — same shape Pydantic v2 emits for a hand-written
        `synalinks.Relation` subclass.

        Returns:
            list[SymbolicDataModel]: one per existing relation label.
        """
        self._require_graph_adapter()
        return self.graph_adapter.get_symbolic_relations()

    async def detect_communities(
        self,
        *,
        algorithm: str = "louvain",
        node_labels: Optional[List[str]] = None,
        rel_labels: Optional[List[str]] = None,
        max_iterations: Optional[int] = None,
    ) -> Any:
        """Run a community-detection algorithm on the graph store.

        Returns a `KnowledgeGraphs` — one
        `KnowledgeGraph` per detected community. Edges that
        straddle communities are dropped. See the adapter's
        documentation for algorithm-specific constraints (Louvain
        requires a single node label; WCC / SCC accept any number).

        Args:
            algorithm: ``"louvain"`` (default),
                ``"weakly_connected_components"``, or
                ``"strongly_connected_components"``.
            node_labels: Optional whitelist of NODE tables to
                project. ``None`` = every existing one.
            rel_labels: Optional whitelist of REL tables to project.
                ``None`` = every existing one.
            max_iterations: Optional upper bound on the algorithm's
                iteration count. ``None`` defers to the engine
                default.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.detect_communities(
            algorithm=algorithm,
            node_labels=node_labels,
            rel_labels=rel_labels,
            max_iterations=max_iterations,
        )

    async def pagerank(
        self,
        *,
        node_labels: Optional[List[str]] = None,
        rel_labels: Optional[List[str]] = None,
        damping_factor: float = 0.85,
        max_iterations: int = 100,
        tolerance: Optional[float] = None,
        normalize_initial: Optional[bool] = None,
        k: Optional[int] = None,
        output_format: str = "json",
    ):
        """Rank entities by PageRank importance on the graph store.

        Returns rows shaped like
        ``{<pk_column>: <pk_value>, "label": <label>, "node": <full node>,
        "rank": <float>}`` sorted by ``rank`` descending. The per-label
        PK column name is preserved verbatim, mirroring
        `entity_similarity_search`.

        Args:
            node_labels: Optional whitelist of NODE tables. ``None``
                projects every existing one.
            rel_labels: Optional whitelist of REL tables. ``None``
                projects every existing one.
            damping_factor: Probability of following an edge vs
                teleporting; 0.85 is the textbook value.
            max_iterations: Upper bound on iterations before
                convergence.
            tolerance: Optional convergence threshold; the algorithm
                stops early when the L1 change between iterations
                falls below this value. ``None`` defers to the
                engine default.
            normalize_initial: Whether to normalize the initial rank
                vector. ``None`` defers to the engine default.
            k: Optional cap on returned rows.
            output_format: ``"json"`` (default) or ``"csv"``.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.pagerank(
            node_labels=node_labels,
            rel_labels=rel_labels,
            damping_factor=damping_factor,
            max_iterations=max_iterations,
            tolerance=tolerance,
            normalize_initial=normalize_initial,
            k=k,
            output_format=output_format,
        )

    async def local_graph_search(
        self,
        text_or_texts: Union[str, List[str]],
        *,
        label: str,
        max_hops: int = 2,
        k: int = 10,
        threshold: Optional[float] = None,
        rel_label: Optional[str] = None,
        ef_search: Optional[int] = None,
    ):
        """GraphRAG-style *local* search on the graph store.

        Vector-matches ``k`` seed entities of ``label``, expands their
        ``max_hops`` undirected neighbourhood, and returns the deduped
        union as a `KnowledgeGraph` — the local context subgraph
        for entity-centric questions ("what does the graph say around
        *these* entities"). See
        `GraphDatabaseAdapter.local_graph_search`.

        Args:
            text_or_texts: Query text (or list); neighbourhoods merge.
            label: Entity label whose vector index seeds the search.
            max_hops: Neighbourhood radius in edges (>= 1, default 2).
            k: Number of seed entities per query text.
            threshold: Optional seed vector-distance ceiling.
            rel_label: Optional rel-label constraint per hop.
            ef_search: Optional HNSW search-depth for the seed lookup.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.local_graph_search(
            text_or_texts,
            label=label,
            max_hops=max_hops,
            k=k,
            threshold=threshold,
            rel_label=rel_label,
            ef_search=ef_search,
        )

    async def build_communities(
        self,
        *,
        algorithm: str = "louvain",
        node_labels: Optional[List[str]] = None,
        rel_labels: Optional[List[str]] = None,
        max_iterations: Optional[int] = None,
        with_pagerank: bool = True,
        damping_factor: float = 0.85,
    ) -> int:
        """Materialize community membership (and PageRank) onto nodes.

        The index-time half of GraphRAG-global: run once after loading
        the graph so `global_graph_search` can read precomputed
        ``community`` / ``rank`` properties instead of re-clustering on
        every query. Idempotent. See
        `GraphDatabaseAdapter.build_communities`.

        Args:
            algorithm: Community-detection algorithm; see
                `detect_communities`.
            node_labels: Optional NODE-table whitelist (``None`` = all).
            rel_labels: Optional REL-table whitelist (``None`` = all).
            max_iterations: Optional clustering iteration cap.
            with_pagerank: Also stamp a PageRank importance score.
            damping_factor: PageRank damping factor.

        Returns:
            (int): the number of nodes stamped.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.build_communities(
            algorithm=algorithm,
            node_labels=node_labels,
            rel_labels=rel_labels,
            max_iterations=max_iterations,
            with_pagerank=with_pagerank,
            damping_factor=damping_factor,
        )

    async def global_graph_search(
        self,
        *,
        node_labels: Optional[List[str]] = None,
        k: int = 10,
        members_per_community: int = 10,
        output_format: str = "json",
    ):
        """GraphRAG-style *global* search on the graph store.

        Rolls up the community / rank properties
        `build_communities` stamped into one aggregate row per
        community (size, total rank, representative members), ordered
        by importance — the theme-centric counterpart to
        `local_graph_search` ("what are the overall patterns
        across the *whole* graph"). Requires `build_communities`
        to have run first. See
        `GraphDatabaseAdapter.global_graph_search`.

        Args:
            node_labels: Optional NODE-table whitelist (``None`` = every
                stamped table).
            k: Maximum number of communities to return.
            members_per_community: Cap on members carried per community.
            output_format: ``"json"`` (default) or ``"csv"``.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.global_graph_search(
            node_labels=node_labels,
            k=k,
            members_per_community=members_per_community,
            output_format=output_format,
        )

    def _serialize_models(self, models, key):
        """Serialize a list of DataModels to their symbolic form.

        Shared between ``data_models``, ``entity_models``, and
        ``relation_models`` since each list goes through the same
        symbolic-model conversion before serialization.
        """
        return [
            (
                serialization_lib.serialize_synalinks_object(
                    model.to_symbolic_data_model(
                        name=key + (f"_{i}_" if i > 0 else "_") + self.name
                    )
                )
                if not is_symbolic_data_model(model)
                else serialization_lib.serialize_synalinks_object(model)
            )
            for i, model in enumerate(models)
        ]

    def get_config(self):
        config = {
            "uri": self.uri,
            "graph_uri": self.graph_uri,
            "name": self.name,
            "metric": self.metric,
            "wipe_on_start": self.wipe_on_start,
        }
        data_models_config = {
            "data_models": self._serialize_models(self.data_models, "data_model"),
            "entity_models": self._serialize_models(self.entity_models, "entity_model"),
            "relation_models": self._serialize_models(
                self.relation_models, "relation_model"
            ),
        }
        embedding_model_config = {}
        if self.embedding_model:
            embedding_model_config = {
                "embedding_model": serialization_lib.serialize_synalinks_object(
                    self.embedding_model,
                )
            }
        return {
            **data_models_config,
            **embedding_model_config,
            **config,
        }

    @classmethod
    def from_config(cls, config):
        def _deserialize(items):
            return [
                serialization_lib.deserialize_synalinks_object(item) for item in items
            ]

        data_models = _deserialize(config.pop("data_models", []))
        entity_models = _deserialize(config.pop("entity_models", []))
        relation_models = _deserialize(config.pop("relation_models", []))
        embedding_model = None
        if "embedding_model" in config:
            embedding_model = serialization_lib.deserialize_synalinks_object(
                config.pop("embedding_model"),
            )
        return cls(
            data_models=data_models,
            entity_models=entity_models,
            relation_models=relation_models,
            embedding_model=embedding_model,
            **config,
        )

build_communities(*, algorithm='louvain', node_labels=None, rel_labels=None, max_iterations=None, with_pagerank=True, damping_factor=0.85) async

Materialize community membership (and PageRank) onto nodes.

The index-time half of GraphRAG-global: run once after loading the graph so global_graph_search can read precomputed community / rank properties instead of re-clustering on every query. Idempotent. See GraphDatabaseAdapter.build_communities.

Parameters:

Name Type Description Default
algorithm str

Community-detection algorithm; see detect_communities.

'louvain'
node_labels Optional[List[str]]

Optional NODE-table whitelist (None = all).

None
rel_labels Optional[List[str]]

Optional REL-table whitelist (None = all).

None
max_iterations Optional[int]

Optional clustering iteration cap.

None
with_pagerank bool

Also stamp a PageRank importance score.

True
damping_factor float

PageRank damping factor.

0.85

Returns:

Type Description
int

the number of nodes stamped.

Source code in synalinks/src/knowledge_bases/knowledge_base.py
async def build_communities(
    self,
    *,
    algorithm: str = "louvain",
    node_labels: Optional[List[str]] = None,
    rel_labels: Optional[List[str]] = None,
    max_iterations: Optional[int] = None,
    with_pagerank: bool = True,
    damping_factor: float = 0.85,
) -> int:
    """Materialize community membership (and PageRank) onto nodes.

    The index-time half of GraphRAG-global: run once after loading
    the graph so `global_graph_search` can read precomputed
    ``community`` / ``rank`` properties instead of re-clustering on
    every query. Idempotent. See
    `GraphDatabaseAdapter.build_communities`.

    Args:
        algorithm: Community-detection algorithm; see
            `detect_communities`.
        node_labels: Optional NODE-table whitelist (``None`` = all).
        rel_labels: Optional REL-table whitelist (``None`` = all).
        max_iterations: Optional clustering iteration cap.
        with_pagerank: Also stamp a PageRank importance score.
        damping_factor: PageRank damping factor.

    Returns:
        (int): the number of nodes stamped.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.build_communities(
        algorithm=algorithm,
        node_labels=node_labels,
        rel_labels=rel_labels,
        max_iterations=max_iterations,
        with_pagerank=with_pagerank,
        damping_factor=damping_factor,
    )

cypher(query, *, params=None, output_format='json', **kwargs) async

Execute a raw Cypher query against the graph.

The graph-store counterpart to query (which executes SQL). Kept under a distinct name to avoid ambiguity when the KnowledgeBase grows both surfaces.

Parameters:

Name Type Description Default
query str

The Cypher query string.

required
params Optional[Dict[str, Any]]

Optional parameters for parameterized queries.

None
output_format str

"json" (default) or "csv".

'json'
**kwargs Any

Adapter-specific options (e.g. read_only).

{}

Returns:

Type Description
Union[List[Dict[str, Any]], str]

A list of dicts when output_format="json", or a CSV

Union[List[Dict[str, Any]], str]

string when output_format="csv".

Source code in synalinks/src/knowledge_bases/knowledge_base.py
async def cypher(
    self,
    query: str,
    *,
    params: Optional[Dict[str, Any]] = None,
    output_format: str = "json",
    **kwargs: Any,
) -> Union[List[Dict[str, Any]], str]:
    """Execute a raw Cypher query against the graph.

    The graph-store counterpart to `query` (which executes
    SQL). Kept under a distinct name to avoid ambiguity when the
    KnowledgeBase grows both surfaces.

    Args:
        query: The Cypher query string.
        params: Optional parameters for parameterized queries.
        output_format: ``"json"`` (default) or ``"csv"``.
        **kwargs: Adapter-specific options (e.g. ``read_only``).

    Returns:
        A list of dicts when ``output_format="json"``, or a CSV
        string when ``output_format="csv"``.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.cypher(
        query, params=params, output_format=output_format, **kwargs
    )

delete(id_or_ids, *, table_name) async

Delete records by primary key from a single table.

Pass a single id or a list. The FTS / vector indexes for the table are rebuilt afterwards so subsequent search calls don't return ghost rows.

Parameters:

Name Type Description Default
id_or_ids Union[Any, List[Any]]

Primary key value, or a list of values.

required
table_name str

Target table.

required

Returns:

Type Description
int

The number of rows actually deleted (0 if no id matched).

Source code in synalinks/src/knowledge_bases/knowledge_base.py
async def delete(
    self,
    id_or_ids: Union[Any, List[Any]],
    *,
    table_name: str,
) -> int:
    """Delete records by primary key from a single table.

    Pass a single id or a list. The FTS / vector indexes for the
    table are rebuilt afterwards so subsequent search calls
    don't return ghost rows.

    Args:
        id_or_ids: Primary key value, or a list of values.
        table_name: Target table.

    Returns:
        The number of rows actually deleted (0 if no id matched).
    """
    return await self.sql_adapter.delete(id_or_ids, table_name=table_name)

delete_entity(id_or_ids, *, label) async

Delete entities by primary key from a label.

Incident relations are removed by the adapter.

Parameters:

Name Type Description Default
id_or_ids Union[Any, List[Any]]

Primary key value, or a list of values.

required
label str

The entity label.

required

Returns:

Type Description
int

The number of entities actually deleted.

Source code in synalinks/src/knowledge_bases/knowledge_base.py
async def delete_entity(
    self,
    id_or_ids: Union[Any, List[Any]],
    *,
    label: str,
) -> int:
    """Delete entities by primary key from a label.

    Incident relations are removed by the adapter.

    Args:
        id_or_ids: Primary key value, or a list of values.
        label: The entity label.

    Returns:
        The number of entities actually deleted.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.delete_entity(id_or_ids, label=label)

delete_relation(*, label, source_id, target_id) async

Delete a relation between two entities.

Parameters:

Name Type Description Default
label str

The relation label.

required
source_id Any

The subject (source) entity's primary key.

required
target_id Any

The object (target) entity's primary key.

required

Returns:

Type Description
int

The number of edges actually deleted.

Source code in synalinks/src/knowledge_bases/knowledge_base.py
async def delete_relation(
    self,
    *,
    label: str,
    source_id: Any,
    target_id: Any,
) -> int:
    """Delete a relation between two entities.

    Args:
        label: The relation label.
        source_id: The subject (source) entity's primary key.
        target_id: The object (target) entity's primary key.

    Returns:
        The number of edges actually deleted.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.delete_relation(
        label=label, source_id=source_id, target_id=target_id
    )

detect_communities(*, algorithm='louvain', node_labels=None, rel_labels=None, max_iterations=None) async

Run a community-detection algorithm on the graph store.

Returns a KnowledgeGraphs — one KnowledgeGraph per detected community. Edges that straddle communities are dropped. See the adapter's documentation for algorithm-specific constraints (Louvain requires a single node label; WCC / SCC accept any number).

Parameters:

Name Type Description Default
algorithm str

"louvain" (default), "weakly_connected_components", or "strongly_connected_components".

'louvain'
node_labels Optional[List[str]]

Optional whitelist of NODE tables to project. None = every existing one.

None
rel_labels Optional[List[str]]

Optional whitelist of REL tables to project. None = every existing one.

None
max_iterations Optional[int]

Optional upper bound on the algorithm's iteration count. None defers to the engine default.

None
Source code in synalinks/src/knowledge_bases/knowledge_base.py
async def detect_communities(
    self,
    *,
    algorithm: str = "louvain",
    node_labels: Optional[List[str]] = None,
    rel_labels: Optional[List[str]] = None,
    max_iterations: Optional[int] = None,
) -> Any:
    """Run a community-detection algorithm on the graph store.

    Returns a `KnowledgeGraphs` — one
    `KnowledgeGraph` per detected community. Edges that
    straddle communities are dropped. See the adapter's
    documentation for algorithm-specific constraints (Louvain
    requires a single node label; WCC / SCC accept any number).

    Args:
        algorithm: ``"louvain"`` (default),
            ``"weakly_connected_components"``, or
            ``"strongly_connected_components"``.
        node_labels: Optional whitelist of NODE tables to
            project. ``None`` = every existing one.
        rel_labels: Optional whitelist of REL tables to project.
            ``None`` = every existing one.
        max_iterations: Optional upper bound on the algorithm's
            iteration count. ``None`` defers to the engine
            default.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.detect_communities(
        algorithm=algorithm,
        node_labels=node_labels,
        rel_labels=rel_labels,
        max_iterations=max_iterations,
    )

drop_table(table_name) async

Drop a table from the knowledge base.

Removes the table's rows, FTS index, and HNSW vector index, then drops the table itself. Also forgets the table in the adapter's known-models list.

Parameters:

Name Type Description Default
table_name str

Target table.

required

Returns:

Type Description
bool

True if a table was dropped, False if it didn't

bool

exist to begin with.

Source code in synalinks/src/knowledge_bases/knowledge_base.py
async def drop_table(self, table_name: str) -> bool:
    """Drop a table from the knowledge base.

    Removes the table's rows, FTS index, and HNSW vector index,
    then drops the table itself. Also forgets the table in the
    adapter's known-models list.

    Args:
        table_name: Target table.

    Returns:
        ``True`` if a table was dropped, ``False`` if it didn't
        exist to begin with.
    """
    return await self.sql_adapter.drop_table(table_name)

BM25 full-text search over entities of a given label.

Parameters:

Name Type Description Default
text_or_texts Union[str, List[str]]

Query text or list of query texts.

required
label str

The entity label to search within.

required
k int

Maximum number of results.

10
threshold Optional[float]

Optional minimum BM25 score.

None
conjunctive bool

AND-mode query (every term must match).

False
bm25_b Optional[float]

Optional override for BM25's b parameter.

None
output_format str

"json" (default) or "csv".

'json'
Source code in synalinks/src/knowledge_bases/knowledge_base.py
async def entity_fulltext_search(
    self,
    text_or_texts: Union[str, List[str]],
    *,
    label: str,
    k: int = 10,
    threshold: Optional[float] = None,
    conjunctive: bool = False,
    bm25_b: Optional[float] = None,
    output_format: str = "json",
):
    """BM25 full-text search over entities of a given label.

    Args:
        text_or_texts: Query text or list of query texts.
        label: The entity label to search within.
        k: Maximum number of results.
        threshold: Optional minimum BM25 score.
        conjunctive: AND-mode query (every term must match).
        bm25_b: Optional override for BM25's ``b`` parameter.
        output_format: ``"json"`` (default) or ``"csv"``.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.entity_fulltext_search(
        text_or_texts,
        label=label,
        k=k,
        threshold=threshold,
        conjunctive=conjunctive,
        bm25_b=bm25_b,
        output_format=output_format,
    )

RRF of vector similarity + BM25 fulltext over entities of a label.

Graph-side counterpart of hybrid_fts_search.

Parameters:

Name Type Description Default
text_or_texts Union[str, List[str]]

Query text or list of query texts.

required
label str

The entity label to search within.

required
k int

Maximum number of results.

10
k_rank int

RRF smoothing constant.

60
similarity_threshold Optional[float]

Optional vector-distance threshold.

None
fulltext_threshold Optional[float]

Optional BM25 threshold.

None
ef_search Optional[int]

HNSW efs knob for the vector branch.

None
conjunctive bool

AND vs OR for the BM25 branch.

False
bm25_b Optional[float]

Optional override for BM25's b parameter.

None
output_format str

"json" (default) or "csv".

'json'
Source code in synalinks/src/knowledge_bases/knowledge_base.py
async def entity_hybrid_fts_search(
    self,
    text_or_texts: Union[str, List[str]],
    *,
    keywords: Optional[Union[str, List[str]]] = None,
    label: str,
    k: int = 10,
    k_rank: int = 60,
    similarity_threshold: Optional[float] = None,
    fulltext_threshold: Optional[float] = None,
    ef_search: Optional[int] = None,
    conjunctive: bool = False,
    bm25_b: Optional[float] = None,
    output_format: str = "json",
):
    """RRF of vector similarity + BM25 fulltext over entities of a label.

    Graph-side counterpart of `hybrid_fts_search`.

    Args:
        text_or_texts: Query text or list of query texts.
        label: The entity label to search within.
        k: Maximum number of results.
        k_rank: RRF smoothing constant.
        similarity_threshold: Optional vector-distance threshold.
        fulltext_threshold: Optional BM25 threshold.
        ef_search: HNSW ``efs`` knob for the vector branch.
        conjunctive: AND vs OR for the BM25 branch.
        bm25_b: Optional override for BM25's ``b`` parameter.
        output_format: ``"json"`` (default) or ``"csv"``.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.entity_hybrid_fts_search(
        text_or_texts=text_or_texts,
        label=label,
        keywords=keywords,
        k=k,
        k_rank=k_rank,
        similarity_threshold=similarity_threshold,
        fulltext_threshold=fulltext_threshold,
        ef_search=ef_search,
        conjunctive=conjunctive,
        bm25_b=bm25_b,
        output_format=output_format,
    )

RRF fusion of vector similarity + regex match over entities.

Sibling of entity_hybrid_fts_search. Falls through to entity_similarity_search when no patterns are supplied; falls through to entity_regex_search when no embedding model is configured.

Parameters:

Name Type Description Default
text_or_texts Union[str, List[str]]

Query text or list of query texts for the vector branch.

required
pattern_or_patterns Optional[Union[str, List[str]]]

Regex pattern (or list) for the regex branch. None skips the regex side.

None
label str

The entity label.

required
fields Optional[List[str]]

Forwarded to entity_regex_search.

None
case_sensitive bool

Forwarded to entity_regex_search.

True
k int

Maximum number of results.

10
k_rank int

RRF smoothing constant.

60
similarity_threshold Optional[float]

Optional vector-distance threshold.

None
output_format str

"json" (default) or "csv".

'json'
Source code in synalinks/src/knowledge_bases/knowledge_base.py
async def entity_hybrid_regex_search(
    self,
    text_or_texts: Union[str, List[str]],
    *,
    pattern_or_patterns: Optional[Union[str, List[str]]] = None,
    label: str,
    fields: Optional[List[str]] = None,
    case_sensitive: bool = True,
    k: int = 10,
    k_rank: int = 60,
    similarity_threshold: Optional[float] = None,
    output_format: str = "json",
):
    """RRF fusion of vector similarity + regex match over entities.

    Sibling of `entity_hybrid_fts_search`. Falls through
    to `entity_similarity_search` when no patterns are
    supplied; falls through to `entity_regex_search` when
    no embedding model is configured.

    Args:
        text_or_texts: Query text or list of query texts for the
            vector branch.
        pattern_or_patterns: Regex pattern (or list) for the
            regex branch. ``None`` skips the regex side.
        label: The entity label.
        fields: Forwarded to `entity_regex_search`.
        case_sensitive: Forwarded to `entity_regex_search`.
        k: Maximum number of results.
        k_rank: RRF smoothing constant.
        similarity_threshold: Optional vector-distance threshold.
        output_format: ``"json"`` (default) or ``"csv"``.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.entity_hybrid_regex_search(
        text_or_texts=text_or_texts,
        pattern_or_patterns=pattern_or_patterns,
        label=label,
        fields=fields,
        case_sensitive=case_sensitive,
        k=k,
        k_rank=k_rank,
        similarity_threshold=similarity_threshold,
        output_format=output_format,
    )

Regex search over entities of a label.

Graph-side counterpart of regex_search. Applies the pattern to every indexed string field on the entity (or to the caller-supplied subset via fields) and returns rows whose any matching field hits.

Parameters:

Name Type Description Default
pattern str

The regex pattern.

required
label str

The entity label to search within.

required
fields Optional[List[str]]

Optional whitelist of fields.

None
case_sensitive bool

When False, matches case-insensitively.

True
k int

Maximum number of rows.

10
output_format str

"json" (default) or "csv".

'json'
Source code in synalinks/src/knowledge_bases/knowledge_base.py
async def entity_regex_search(
    self,
    pattern: str,
    *,
    label: str,
    fields: Optional[List[str]] = None,
    case_sensitive: bool = True,
    k: int = 10,
    output_format: str = "json",
):
    """Regex search over entities of a label.

    Graph-side counterpart of `regex_search`. Applies the
    pattern to every indexed string field on the entity (or to
    the caller-supplied subset via ``fields``) and returns rows
    whose any matching field hits.

    Args:
        pattern: The regex pattern.
        label: The entity label to search within.
        fields: Optional whitelist of fields.
        case_sensitive: When ``False``, matches case-insensitively.
        k: Maximum number of rows.
        output_format: ``"json"`` (default) or ``"csv"``.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.entity_regex_search(
        pattern,
        label=label,
        fields=fields,
        case_sensitive=case_sensitive,
        k=k,
        output_format=output_format,
    )

Vector similarity search over entities of a given label.

Parameters:

Name Type Description Default
text_or_texts Union[str, List[str]]

Query text or list of query texts.

required
label str

The entity label to search within.

required
k int

Maximum number of results.

10
threshold Optional[float]

Optional vector-distance threshold.

None
ef_search Optional[int]

Engine-specific search-time recall knob (HNSW efs). Higher = better recall but slower.

None
output_format str

"json" (default) or "csv".

'json'
Source code in synalinks/src/knowledge_bases/knowledge_base.py
async def entity_similarity_search(
    self,
    text_or_texts: Union[str, List[str]],
    *,
    label: str,
    k: int = 10,
    threshold: Optional[float] = None,
    ef_search: Optional[int] = None,
    output_format: str = "json",
):
    """Vector similarity search over entities of a given label.

    Args:
        text_or_texts: Query text or list of query texts.
        label: The entity label to search within.
        k: Maximum number of results.
        threshold: Optional vector-distance threshold.
        ef_search: Engine-specific search-time recall knob (HNSW
            ``efs``). Higher = better recall but slower.
        output_format: ``"json"`` (default) or ``"csv"``.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.entity_similarity_search(
        text_or_texts,
        label=label,
        k=k,
        threshold=threshold,
        ef_search=ef_search,
        output_format=output_format,
    )

from_csv(path, *, table_name=None, table_description=None, delimiter=',', encoding='utf-8', header=True) async

Bulk-load a CSV file directly into the knowledge base.

Skips the Python row pipeline entirely (no Pydantic, no Jinja, no per-row INSERT) and instead delegates to the database's native CSV reader. Roughly two orders of magnitude faster than update(CSVDataset(...)) for non-trivial files — see benchmarks/bench_kb_ingest.py.

The target table's schema is inferred directly from the file's columns, with the first column promoted to PRIMARY KEY. The returned SymbolicDataModel is the handle you pass to subsequent search / get calls — you don't need to pre-declare a DataModel for this table.

Use the streaming update(<...>Dataset(...)) path instead when source rows need transformation before storage (column renames, derived fields, HuggingFace datasets, etc.).

Parameters:

Name Type Description Default
path str

Path to the CSV file.

required
table_name Optional[str]

Target table name. Defaults to the file's stem (/data/my-docs.csvMyDocs). Whatever value lands here is always normalized to PascalCase.

None
table_description Optional[str]

Optional natural-language description attached to the resulting schema.

None
delimiter str

Field delimiter. Defaults to ",".

','
encoding str

File encoding. Defaults to "utf-8".

'utf-8'
header bool

Whether the first row is a header. Defaults to True.

True

Returns:

Type Description
Any

The SymbolicDataModel for the loaded table.

Source code in synalinks/src/knowledge_bases/knowledge_base.py
async def from_csv(
    self,
    path: str,
    *,
    table_name: Optional[str] = None,
    table_description: Optional[str] = None,
    delimiter: str = ",",
    encoding: str = "utf-8",
    header: bool = True,
) -> Any:
    """Bulk-load a CSV file directly into the knowledge base.

    Skips the Python row pipeline entirely (no Pydantic, no Jinja,
    no per-row INSERT) and instead delegates to the database's
    native CSV reader. Roughly two orders of magnitude faster than
    ``update(CSVDataset(...))`` for non-trivial files — see
    ``benchmarks/bench_kb_ingest.py``.

    The target table's schema is inferred directly from the
    file's columns, with the first column promoted to PRIMARY
    KEY. The returned `SymbolicDataModel` is the handle
    you pass to subsequent search / get calls — you don't need
    to pre-declare a ``DataModel`` for this table.

    Use the streaming ``update(<...>Dataset(...))`` path instead
    when source rows need transformation before storage (column
    renames, derived fields, HuggingFace datasets, etc.).

    Args:
        path: Path to the CSV file.
        table_name: Target table name. Defaults to the file's stem
            (``/data/my-docs.csv`` → ``MyDocs``). Whatever value
            lands here is always normalized to PascalCase.
        table_description: Optional natural-language description
            attached to the resulting schema.
        delimiter: Field delimiter. Defaults to ``","``.
        encoding: File encoding. Defaults to ``"utf-8"``.
        header: Whether the first row is a header. Defaults to
            ``True``.

    Returns:
        The `SymbolicDataModel` for the loaded table.
    """
    return await self.sql_adapter.from_csv(
        path,
        table_name=table_name,
        table_description=table_description,
        delimiter=delimiter,
        encoding=encoding,
        header=header,
    )

from_json(path, *, table_name=None, table_description=None) async

Bulk-load a JSON file (top-level array of objects).

Same trade-offs as from_csv / from_parquet — bypasses the Python row pipeline. The file must contain a top-level JSON array. Use from_jsonl for the one-object-per-line NDJSON format.

Parameters:

Name Type Description Default
path str

Path to the JSON file.

required
table_name Optional[str]

Target table name. Defaults to the file's stem coerced to PascalCase.

None
table_description Optional[str]

Optional schema description.

None

Returns:

Type Description
Any

The SymbolicDataModel for the loaded table.

Source code in synalinks/src/knowledge_bases/knowledge_base.py
async def from_json(
    self,
    path: str,
    *,
    table_name: Optional[str] = None,
    table_description: Optional[str] = None,
) -> Any:
    """Bulk-load a JSON file (top-level array of objects).

    Same trade-offs as `from_csv` / `from_parquet` —
    bypasses the Python row pipeline. The file must contain a
    top-level JSON array. Use `from_jsonl` for the
    one-object-per-line NDJSON format.

    Args:
        path: Path to the JSON file.
        table_name: Target table name. Defaults to the file's stem
            coerced to PascalCase.
        table_description: Optional schema description.

    Returns:
        The `SymbolicDataModel` for the loaded table.
    """
    return await self.sql_adapter.from_json(
        path, table_name=table_name, table_description=table_description
    )

from_jsonl(path, *, table_name=None, table_description=None) async

Bulk-load a JSON Lines (NDJSON) file.

Same trade-offs as from_csv / from_parquet, and the right call for very large JSON sources that aren't a single array.

Parameters:

Name Type Description Default
path str

Path to the JSONL file.

required
table_name Optional[str]

Target table name. Defaults to the file's stem coerced to PascalCase.

None
table_description Optional[str]

Optional schema description.

None

Returns:

Type Description
Any

The SymbolicDataModel for the loaded table.

Source code in synalinks/src/knowledge_bases/knowledge_base.py
async def from_jsonl(
    self,
    path: str,
    *,
    table_name: Optional[str] = None,
    table_description: Optional[str] = None,
) -> Any:
    """Bulk-load a JSON Lines (NDJSON) file.

    Same trade-offs as `from_csv` / `from_parquet`,
    and the right call for very large JSON sources that aren't
    a single array.

    Args:
        path: Path to the JSONL file.
        table_name: Target table name. Defaults to the file's stem
            coerced to PascalCase.
        table_description: Optional schema description.

    Returns:
        The `SymbolicDataModel` for the loaded table.
    """
    return await self.sql_adapter.from_jsonl(
        path, table_name=table_name, table_description=table_description
    )

from_parquet(path, *, table_name=None, table_description=None) async

Bulk-load a Parquet file directly into the knowledge base.

Same trade-offs as from_csv — bypasses the Python row pipeline for native database ingestion. Parquet's schema is explicit in the file footer so there is no type-inference guesswork to worry about.

Parameters:

Name Type Description Default
path str

Path to the Parquet file.

required
table_name Optional[str]

Target table name. Defaults to the file's stem coerced to PascalCase.

None
table_description Optional[str]

Optional schema description.

None

Returns:

Type Description
Any

The SymbolicDataModel for the loaded table.

Source code in synalinks/src/knowledge_bases/knowledge_base.py
async def from_parquet(
    self,
    path: str,
    *,
    table_name: Optional[str] = None,
    table_description: Optional[str] = None,
) -> Any:
    """Bulk-load a Parquet file directly into the knowledge base.

    Same trade-offs as `from_csv` — bypasses the Python row
    pipeline for native database ingestion. Parquet's schema is
    explicit in the file footer so there is no type-inference
    guesswork to worry about.

    Args:
        path: Path to the Parquet file.
        table_name: Target table name. Defaults to the file's stem
            coerced to PascalCase.
        table_description: Optional schema description.

    Returns:
        The `SymbolicDataModel` for the loaded table.
    """
    return await self.sql_adapter.from_parquet(
        path, table_name=table_name, table_description=table_description
    )

BM25 full-text search against a single table.

Parameters:

Name Type Description Default
text_or_texts Union[str, List[str]]

Query text or list of query texts.

required
table_name str

Target table.

required
k int

Maximum number of results.

10
threshold Optional[float]

Optional minimum BM25 score.

None
conjunctive bool

AND-mode query (every term must match). Default False keeps OR semantics.

False
bm25_b Optional[float]

Optional override for BM25's b parameter (document-length normalization).

None
bm25_k Optional[float]

Optional override for BM25's k1 parameter (term-frequency saturation).

None
output_format str

"json" (list of dicts, default) / "csv" (text).

'json'
Source code in synalinks/src/knowledge_bases/knowledge_base.py
async def fulltext_search(
    self,
    text_or_texts: Union[str, List[str]],
    *,
    table_name: str,
    k: int = 10,
    threshold: Optional[float] = None,
    conjunctive: bool = False,
    bm25_b: Optional[float] = None,
    bm25_k: Optional[float] = None,
    output_format: str = "json",
):
    """BM25 full-text search against a single table.

    Args:
        text_or_texts: Query text or list of query texts.
        table_name: Target table.
        k: Maximum number of results.
        threshold: Optional minimum BM25 score.
        conjunctive: AND-mode query (every term must match).
            Default ``False`` keeps OR semantics.
        bm25_b: Optional override for BM25's ``b`` parameter
            (document-length normalization).
        bm25_k: Optional override for BM25's ``k1`` parameter
            (term-frequency saturation).
        output_format: ``"json"`` (list of dicts, default) / ``"csv"`` (text).
    """
    return await self.sql_adapter.fulltext_search(
        text_or_texts,
        table_name=table_name,
        k=k,
        threshold=threshold,
        conjunctive=conjunctive,
        bm25_b=bm25_b,
        bm25_k=bm25_k,
        output_format=output_format,
    )

get(id_or_ids, *, table_name) async

Retrieve one or more records by primary key from a single table.

Parameters:

Name Type Description Default
id_or_ids Union[Any, List[Any]]

A single primary key value, or a list of values.

required
table_name str

Target table.

required

Returns:

Type Description
Union[Optional[Any], List[Optional[Any]]]

A single JsonDataModel (or None) when called with one id;

Union[Optional[Any], List[Optional[Any]]]

a list of JsonDataModels (with None in the slots that did

Union[Optional[Any], List[Optional[Any]]]

not match) when called with a list.

Source code in synalinks/src/knowledge_bases/knowledge_base.py
async def get(
    self,
    id_or_ids: Union[Any, List[Any]],
    *,
    table_name: str,
) -> Union[Optional[Any], List[Optional[Any]]]:
    """Retrieve one or more records by primary key from a single table.

    Args:
        id_or_ids: A single primary key value, or a list of values.
        table_name: Target table.

    Returns:
        A single JsonDataModel (or ``None``) when called with one id;
        a list of JsonDataModels (with ``None`` in the slots that did
        not match) when called with a list.
    """
    return await self.sql_adapter.get(id_or_ids, table_name=table_name)

get_entity(id_or_ids, *, label) async

Retrieve one or more entities by primary key from a label.

Parameters:

Name Type Description Default
id_or_ids Union[Any, List[Any]]

A single primary key value, or a list of values.

required
label str

The entity label (node type).

required

Returns:

Type Description
Union[Optional[Any], List[Optional[Any]]]

A single JsonDataModel (or None) for a scalar

Union[Optional[Any], List[Optional[Any]]]

argument; a list (with None for misses) for a list

Union[Optional[Any], List[Optional[Any]]]

argument.

Source code in synalinks/src/knowledge_bases/knowledge_base.py
async def get_entity(
    self,
    id_or_ids: Union[Any, List[Any]],
    *,
    label: str,
) -> Union[Optional[Any], List[Optional[Any]]]:
    """Retrieve one or more entities by primary key from a label.

    Args:
        id_or_ids: A single primary key value, or a list of values.
        label: The entity label (node type).

    Returns:
        A single ``JsonDataModel`` (or ``None``) for a scalar
        argument; a list (with ``None`` for misses) for a list
        argument.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.get_entity(id_or_ids, label=label)

get_symbolic_data_models()

Retrieve all symbolic data models (table definitions) from the database.

Returns a list of SymbolicDataModel objects representing each table in the database. This is useful for introspecting the database schema or for passing to search methods to limit the search scope.

Returns:

Name Type Description
list List[Any]

List of symbolic data models representing the database tables.

Example
symbolic_models = knowledge_base.get_symbolic_data_models()
for model in symbolic_models:
    schema = model.get_schema()
    print(f"Table: {schema['title']}")
    print(f"Fields: {list(schema['properties'].keys())}")
Source code in synalinks/src/knowledge_bases/knowledge_base.py
def get_symbolic_data_models(self) -> List[Any]:
    """Retrieve all symbolic data models (table definitions) from the database.

    Returns a list of SymbolicDataModel objects representing each table
    in the database. This is useful for introspecting the database schema
    or for passing to search methods to limit the search scope.

    Returns:
        list: List of symbolic data models representing the database tables.

    Example:
        ```python
        symbolic_models = knowledge_base.get_symbolic_data_models()
        for model in symbolic_models:
            schema = model.get_schema()
            print(f"Table: {schema['title']}")
            print(f"Fields: {list(schema['properties'].keys())}")
        ```
    """
    return self.sql_adapter.get_symbolic_data_models()

get_symbolic_entities()

Retrieve a SymbolicDataModel per node label in the graph.

Graph-side counterpart of get_symbolic_data_models, split by graph role: returns only entity (node) schemas. Each schema carries a label const discriminator and one property per stored column.

Returns:

Type Description
List[Any]

list[SymbolicDataModel]: one per existing node label.

Source code in synalinks/src/knowledge_bases/knowledge_base.py
def get_symbolic_entities(self) -> List[Any]:
    """Retrieve a ``SymbolicDataModel`` per node label in the graph.

    Graph-side counterpart of `get_symbolic_data_models`,
    split by graph role: returns only entity (node) schemas.
    Each schema carries a ``label`` ``const`` discriminator and
    one property per stored column.

    Returns:
        list[SymbolicDataModel]: one per existing node label.
    """
    self._require_graph_adapter()
    return self.graph_adapter.get_symbolic_entities()

get_symbolic_relations()

Retrieve a SymbolicDataModel per relation label in the graph.

Each returned schema includes its endpoint node schemas under $defs and references them as subj / obj via $ref — same shape Pydantic v2 emits for a hand-written synalinks.Relation subclass.

Returns:

Type Description
List[Any]

list[SymbolicDataModel]: one per existing relation label.

Source code in synalinks/src/knowledge_bases/knowledge_base.py
def get_symbolic_relations(self) -> List[Any]:
    """Retrieve a ``SymbolicDataModel`` per relation label in the graph.

    Each returned schema includes its endpoint node schemas under
    ``$defs`` and references them as ``subj`` / ``obj`` via
    ``$ref`` — same shape Pydantic v2 emits for a hand-written
    `synalinks.Relation` subclass.

    Returns:
        list[SymbolicDataModel]: one per existing relation label.
    """
    self._require_graph_adapter()
    return self.graph_adapter.get_symbolic_relations()

getall(*, table_name, limit=50, offset=0) async

Retrieve all records from a table with pagination.

Parameters:

Name Type Description Default
table_name str

Target table.

required
limit int

Maximum number of records to return (default: 50).

50
offset int

Number of records to skip (default: 0).

0

Returns:

Type Description
List[Any]

List of JsonDataModels.

Source code in synalinks/src/knowledge_bases/knowledge_base.py
async def getall(
    self,
    *,
    table_name: str,
    limit: int = 50,
    offset: int = 0,
) -> List[Any]:
    """Retrieve all records from a table with pagination.

    Args:
        table_name: Target table.
        limit: Maximum number of records to return (default: 50).
        offset: Number of records to skip (default: 0).

    Returns:
        List of JsonDataModels.
    """
    return await self.sql_adapter.getall(
        table_name=table_name, limit=limit, offset=offset
    )

GraphRAG-style global search on the graph store.

Rolls up the community / rank properties build_communities stamped into one aggregate row per community (size, total rank, representative members), ordered by importance — the theme-centric counterpart to local_graph_search ("what are the overall patterns across the whole graph"). Requires build_communities to have run first. See GraphDatabaseAdapter.global_graph_search.

Parameters:

Name Type Description Default
node_labels Optional[List[str]]

Optional NODE-table whitelist (None = every stamped table).

None
k int

Maximum number of communities to return.

10
members_per_community int

Cap on members carried per community.

10
output_format str

"json" (default) or "csv".

'json'
Source code in synalinks/src/knowledge_bases/knowledge_base.py
async def global_graph_search(
    self,
    *,
    node_labels: Optional[List[str]] = None,
    k: int = 10,
    members_per_community: int = 10,
    output_format: str = "json",
):
    """GraphRAG-style *global* search on the graph store.

    Rolls up the community / rank properties
    `build_communities` stamped into one aggregate row per
    community (size, total rank, representative members), ordered
    by importance — the theme-centric counterpart to
    `local_graph_search` ("what are the overall patterns
    across the *whole* graph"). Requires `build_communities`
    to have run first. See
    `GraphDatabaseAdapter.global_graph_search`.

    Args:
        node_labels: Optional NODE-table whitelist (``None`` = every
            stamped table).
        k: Maximum number of communities to return.
        members_per_community: Cap on members carried per community.
        output_format: ``"json"`` (default) or ``"csv"``.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.global_graph_search(
        node_labels=node_labels,
        k=k,
        members_per_community=members_per_community,
        output_format=output_format,
    )

Reciprocal-Rank-Fusion of vector similarity + BM25 fulltext.

Falls back to full-text-only when no embedding model is configured. The regex-side sibling is hybrid_regex_search.

Parameters:

Name Type Description Default
text_or_texts Union[str, List[str]]

Query text or list of query texts.

required
table_name str

Target table.

required
k int

Maximum results.

10
k_rank int

RRF smoothing constant. Lower emphasizes top ranks more strongly (default: 60).

60
similarity_threshold Optional[float]

Optional vector-distance threshold.

None
fulltext_threshold Optional[float]

Optional BM25 threshold.

None
ef_search Optional[int]

Forwarded to the vector branch; HNSW search-time candidate-list depth.

None
conjunctive bool

Forwarded to the BM25 branch; AND-mode query.

False
bm25_b Optional[float]

Forwarded to the BM25 branch; document-length normalization override.

None
bm25_k Optional[float]

Forwarded to the BM25 branch; term-frequency saturation override.

None
output_format str

"json" (list of dicts, default) / "csv" (text).

'json'
Source code in synalinks/src/knowledge_bases/knowledge_base.py
async def hybrid_fts_search(
    self,
    text_or_texts: Union[str, List[str]],
    *,
    keywords: Optional[Union[str, List[str]]] = None,
    table_name: str,
    k: int = 10,
    k_rank: int = 60,
    similarity_threshold: Optional[float] = None,
    fulltext_threshold: Optional[float] = None,
    ef_search: Optional[int] = None,
    conjunctive: bool = False,
    bm25_b: Optional[float] = None,
    bm25_k: Optional[float] = None,
    output_format: str = "json",
):
    """Reciprocal-Rank-Fusion of vector similarity + BM25 fulltext.

    Falls back to full-text-only when no embedding model is
    configured. The regex-side sibling is
    `hybrid_regex_search`.

    Args:
        text_or_texts: Query text or list of query texts.
        table_name: Target table.
        k: Maximum results.
        k_rank: RRF smoothing constant. Lower emphasizes top
            ranks more strongly (default: 60).
        similarity_threshold: Optional vector-distance threshold.
        fulltext_threshold: Optional BM25 threshold.
        ef_search: Forwarded to the vector branch; HNSW
            search-time candidate-list depth.
        conjunctive: Forwarded to the BM25 branch; AND-mode query.
        bm25_b: Forwarded to the BM25 branch; document-length
            normalization override.
        bm25_k: Forwarded to the BM25 branch; term-frequency
            saturation override.
        output_format: ``"json"`` (list of dicts, default) / ``"csv"`` (text).
    """
    return await self.sql_adapter.hybrid_fts_search(
        text_or_texts=text_or_texts,
        table_name=table_name,
        keywords=keywords,
        k=k,
        k_rank=k_rank,
        similarity_threshold=similarity_threshold,
        fulltext_threshold=fulltext_threshold,
        ef_search=ef_search,
        conjunctive=conjunctive,
        bm25_b=bm25_b,
        bm25_k=bm25_k,
        output_format=output_format,
    )

Reciprocal-Rank-Fusion of vector similarity + regex.

The regex-side counterpart to hybrid_fts_search (which pairs vector with BM25 fulltext). The two signals are orthogonal: vectors capture semantic similarity, regex captures exact textual shape. Ranks are fused with the same RRF formula.

Parameters:

Name Type Description Default
text_or_texts Union[str, List[str]]

Natural-language query (or list) for the vector side.

required
pattern_or_patterns Union[str, List[str], None]

RE2 pattern (or list) for the regex side. None falls back to plain similarity search.

None
table_name str

Target table.

required
k int

Maximum results.

10
k_rank int

RRF smoothing constant.

60
similarity_threshold Optional[float]

Vector-distance threshold.

None
ef_search Optional[int]

Forwarded to the vector branch; HNSW search-time candidate-list depth.

None
fields Optional[List[str]]

Forwarded to the regex side.

None
case_sensitive bool

Forwarded to the regex side.

True
output_format str

"json" (list of dicts, default) / "csv" (text).

'json'
Source code in synalinks/src/knowledge_bases/knowledge_base.py
async def hybrid_regex_search(
    self,
    text_or_texts: Union[str, List[str]],
    *,
    pattern_or_patterns: Union[str, List[str], None] = None,
    table_name: str,
    k: int = 10,
    k_rank: int = 60,
    similarity_threshold: Optional[float] = None,
    ef_search: Optional[int] = None,
    fields: Optional[List[str]] = None,
    case_sensitive: bool = True,
    output_format: str = "json",
):
    """Reciprocal-Rank-Fusion of vector similarity + regex.

    The regex-side counterpart to `hybrid_fts_search` (which
    pairs vector with BM25 fulltext). The two signals are
    orthogonal: vectors capture semantic similarity, regex
    captures exact textual shape. Ranks are fused with the same
    RRF formula.

    Args:
        text_or_texts: Natural-language query (or list) for the
            vector side.
        pattern_or_patterns: RE2 pattern (or list) for the regex
            side. ``None`` falls back to plain similarity search.
        table_name: Target table.
        k: Maximum results.
        k_rank: RRF smoothing constant.
        similarity_threshold: Vector-distance threshold.
        ef_search: Forwarded to the vector branch; HNSW
            search-time candidate-list depth.
        fields: Forwarded to the regex side.
        case_sensitive: Forwarded to the regex side.
        output_format: ``"json"`` (list of dicts, default) / ``"csv"`` (text).
    """
    return await self.sql_adapter.hybrid_regex_search(
        text_or_texts=text_or_texts,
        pattern_or_patterns=pattern_or_patterns,
        table_name=table_name,
        k=k,
        k_rank=k_rank,
        similarity_threshold=similarity_threshold,
        ef_search=ef_search,
        fields=fields,
        case_sensitive=case_sensitive,
        output_format=output_format,
    )

Deprecated alias of hybrid_fts_search.

Kept for backwards compatibility. The new name is symmetric with hybrid_regex_search; prefer it in new code.

Source code in synalinks/src/knowledge_bases/knowledge_base.py
async def hybrid_search(self, *args, **kwargs):
    """Deprecated alias of `hybrid_fts_search`.

    Kept for backwards compatibility. The new name is symmetric
    with `hybrid_regex_search`; prefer it in new code.
    """
    return await self.hybrid_fts_search(*args, **kwargs)

GraphRAG-style local search on the graph store.

Vector-matches k seed entities of label, expands their max_hops undirected neighbourhood, and returns the deduped union as a KnowledgeGraph — the local context subgraph for entity-centric questions ("what does the graph say around these entities"). See GraphDatabaseAdapter.local_graph_search.

Parameters:

Name Type Description Default
text_or_texts Union[str, List[str]]

Query text (or list); neighbourhoods merge.

required
label str

Entity label whose vector index seeds the search.

required
max_hops int

Neighbourhood radius in edges (>= 1, default 2).

2
k int

Number of seed entities per query text.

10
threshold Optional[float]

Optional seed vector-distance ceiling.

None
rel_label Optional[str]

Optional rel-label constraint per hop.

None
ef_search Optional[int]

Optional HNSW search-depth for the seed lookup.

None
Source code in synalinks/src/knowledge_bases/knowledge_base.py
async def local_graph_search(
    self,
    text_or_texts: Union[str, List[str]],
    *,
    label: str,
    max_hops: int = 2,
    k: int = 10,
    threshold: Optional[float] = None,
    rel_label: Optional[str] = None,
    ef_search: Optional[int] = None,
):
    """GraphRAG-style *local* search on the graph store.

    Vector-matches ``k`` seed entities of ``label``, expands their
    ``max_hops`` undirected neighbourhood, and returns the deduped
    union as a `KnowledgeGraph` — the local context subgraph
    for entity-centric questions ("what does the graph say around
    *these* entities"). See
    `GraphDatabaseAdapter.local_graph_search`.

    Args:
        text_or_texts: Query text (or list); neighbourhoods merge.
        label: Entity label whose vector index seeds the search.
        max_hops: Neighbourhood radius in edges (>= 1, default 2).
        k: Number of seed entities per query text.
        threshold: Optional seed vector-distance ceiling.
        rel_label: Optional rel-label constraint per hop.
        ef_search: Optional HNSW search-depth for the seed lookup.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.local_graph_search(
        text_or_texts,
        label=label,
        max_hops=max_hops,
        k=k,
        threshold=threshold,
        rel_label=rel_label,
        ef_search=ef_search,
    )

pagerank(*, node_labels=None, rel_labels=None, damping_factor=0.85, max_iterations=100, tolerance=None, normalize_initial=None, k=None, output_format='json') async

Rank entities by PageRank importance on the graph store.

Returns rows shaped like {<pk_column>: <pk_value>, "label": <label>, "node": <full node>, "rank": <float>} sorted by rank descending. The per-label PK column name is preserved verbatim, mirroring entity_similarity_search.

Parameters:

Name Type Description Default
node_labels Optional[List[str]]

Optional whitelist of NODE tables. None projects every existing one.

None
rel_labels Optional[List[str]]

Optional whitelist of REL tables. None projects every existing one.

None
damping_factor float

Probability of following an edge vs teleporting; 0.85 is the textbook value.

0.85
max_iterations int

Upper bound on iterations before convergence.

100
tolerance Optional[float]

Optional convergence threshold; the algorithm stops early when the L1 change between iterations falls below this value. None defers to the engine default.

None
normalize_initial Optional[bool]

Whether to normalize the initial rank vector. None defers to the engine default.

None
k Optional[int]

Optional cap on returned rows.

None
output_format str

"json" (default) or "csv".

'json'
Source code in synalinks/src/knowledge_bases/knowledge_base.py
async def pagerank(
    self,
    *,
    node_labels: Optional[List[str]] = None,
    rel_labels: Optional[List[str]] = None,
    damping_factor: float = 0.85,
    max_iterations: int = 100,
    tolerance: Optional[float] = None,
    normalize_initial: Optional[bool] = None,
    k: Optional[int] = None,
    output_format: str = "json",
):
    """Rank entities by PageRank importance on the graph store.

    Returns rows shaped like
    ``{<pk_column>: <pk_value>, "label": <label>, "node": <full node>,
    "rank": <float>}`` sorted by ``rank`` descending. The per-label
    PK column name is preserved verbatim, mirroring
    `entity_similarity_search`.

    Args:
        node_labels: Optional whitelist of NODE tables. ``None``
            projects every existing one.
        rel_labels: Optional whitelist of REL tables. ``None``
            projects every existing one.
        damping_factor: Probability of following an edge vs
            teleporting; 0.85 is the textbook value.
        max_iterations: Upper bound on iterations before
            convergence.
        tolerance: Optional convergence threshold; the algorithm
            stops early when the L1 change between iterations
            falls below this value. ``None`` defers to the
            engine default.
        normalize_initial: Whether to normalize the initial rank
            vector. ``None`` defers to the engine default.
        k: Optional cap on returned rows.
        output_format: ``"json"`` (default) or ``"csv"``.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.pagerank(
        node_labels=node_labels,
        rel_labels=rel_labels,
        damping_factor=damping_factor,
        max_iterations=max_iterations,
        tolerance=tolerance,
        normalize_initial=normalize_initial,
        k=k,
        output_format=output_format,
    )

BM25 variable-length path search, AND semantics.

Same shape as path_similarity_search but driven by BM25 fulltext on each endpoint. Per matched path, score is the sum of the subject-side and object-side BM25 scores.

Parameters:

Name Type Description Default
subj_text_or_texts Union[str, List[str]]

Keyword query (or list) for the subject.

required
obj_text_or_texts Union[str, List[str]]

Keyword query (or list) for the object.

required
subj_label str

Entity label of the subject endpoint.

required
obj_label str

Entity label of the object endpoint.

required
label Optional[str]

Optional rel-label constraint for every hop.

None
min_hops int

Minimum hop count, inclusive (default: 1).

1
max_hops int

Maximum hop count, inclusive (default: 3).

3
k int

Maximum number of results.

10
threshold Optional[float]

Optional minimum BM25 threshold per endpoint.

None
conjunctive bool

AND-mode BM25 query.

False
bm25_b Optional[float]

Optional override for BM25's b parameter.

None
output_format str

"json" (default) or "csv".

'json'
Source code in synalinks/src/knowledge_bases/knowledge_base.py
async def path_fulltext_search(
    self,
    subj_text_or_texts: Union[str, List[str]],
    obj_text_or_texts: Union[str, List[str]],
    *,
    subj_label: str,
    obj_label: str,
    label: Optional[str] = None,
    min_hops: int = 1,
    max_hops: int = 3,
    k: int = 10,
    threshold: Optional[float] = None,
    conjunctive: bool = False,
    bm25_b: Optional[float] = None,
    output_format: str = "json",
):
    """BM25 variable-length path search, AND semantics.

    Same shape as `path_similarity_search` but driven by BM25
    fulltext on each endpoint. Per matched path, ``score`` is the
    sum of the subject-side and object-side BM25 scores.

    Args:
        subj_text_or_texts: Keyword query (or list) for the subject.
        obj_text_or_texts: Keyword query (or list) for the object.
        subj_label: Entity label of the subject endpoint.
        obj_label: Entity label of the object endpoint.
        label: Optional rel-label constraint for every hop.
        min_hops: Minimum hop count, inclusive (default: 1).
        max_hops: Maximum hop count, inclusive (default: 3).
        k: Maximum number of results.
        threshold: Optional minimum BM25 threshold per endpoint.
        conjunctive: AND-mode BM25 query.
        bm25_b: Optional override for BM25's ``b`` parameter.
        output_format: ``"json"`` (default) or ``"csv"``.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.path_fulltext_search(
        subj_text_or_texts=subj_text_or_texts,
        obj_text_or_texts=obj_text_or_texts,
        subj_label=subj_label,
        obj_label=obj_label,
        label=label,
        min_hops=min_hops,
        max_hops=max_hops,
        k=k,
        threshold=threshold,
        conjunctive=conjunctive,
        bm25_b=bm25_b,
        output_format=output_format,
    )

Hybrid variable-length path search where BOTH endpoints match.

AND-semantics. Each side is hybrid-searched (vec + fts) independently; per matching path the rrf_score is the sum of the subject-side and object-side hybrid scores. Falls back to fulltext-only when no embedding model is configured.

Parameters:

Name Type Description Default
subj_text_or_texts Union[str, List[str]]

Query text (or list) for the subject.

required
obj_text_or_texts Union[str, List[str]]

Query text (or list) for the object.

required
subj_label str

Entity label of the subject endpoint.

required
obj_label str

Entity label of the object endpoint.

required
label Optional[str]

Optional rel-label constraint for every hop.

None
min_hops int

Minimum hop count, inclusive (default: 1).

1
max_hops int

Maximum hop count, inclusive (default: 3).

3
k int

Maximum number of results.

10
k_rank int

RRF smoothing constant.

60
similarity_threshold Optional[float]

Optional vector-distance threshold.

None
fulltext_threshold Optional[float]

Optional BM25 score threshold.

None
ef_search Optional[int]

HNSW efs knob applied to both endpoints.

None
conjunctive bool

AND vs OR for the BM25 branch.

False
bm25_b Optional[float]

Optional override for BM25's b parameter.

None
output_format str

"json" (default) or "csv".

'json'
Source code in synalinks/src/knowledge_bases/knowledge_base.py
async def path_hybrid_fts_search(
    self,
    subj_text_or_texts: Union[str, List[str]],
    obj_text_or_texts: Union[str, List[str]],
    *,
    subj_keywords: Optional[Union[str, List[str]]] = None,
    obj_keywords: Optional[Union[str, List[str]]] = None,
    subj_label: str,
    obj_label: str,
    label: Optional[str] = None,
    min_hops: int = 1,
    max_hops: int = 3,
    k: int = 10,
    k_rank: int = 60,
    similarity_threshold: Optional[float] = None,
    fulltext_threshold: Optional[float] = None,
    ef_search: Optional[int] = None,
    conjunctive: bool = False,
    bm25_b: Optional[float] = None,
    output_format: str = "json",
):
    """Hybrid variable-length path search where BOTH endpoints match.

    AND-semantics. Each side is hybrid-searched (vec + fts)
    independently; per matching path the ``rrf_score`` is the
    sum of the subject-side and object-side hybrid scores.
    Falls back to fulltext-only when no embedding model is
    configured.

    Args:
        subj_text_or_texts: Query text (or list) for the subject.
        obj_text_or_texts: Query text (or list) for the object.
        subj_label: Entity label of the subject endpoint.
        obj_label: Entity label of the object endpoint.
        label: Optional rel-label constraint for every hop.
        min_hops: Minimum hop count, inclusive (default: 1).
        max_hops: Maximum hop count, inclusive (default: 3).
        k: Maximum number of results.
        k_rank: RRF smoothing constant.
        similarity_threshold: Optional vector-distance threshold.
        fulltext_threshold: Optional BM25 score threshold.
        ef_search: HNSW ``efs`` knob applied to both endpoints.
        conjunctive: AND vs OR for the BM25 branch.
        bm25_b: Optional override for BM25's ``b`` parameter.
        output_format: ``"json"`` (default) or ``"csv"``.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.path_hybrid_fts_search(
        subj_text_or_texts=subj_text_or_texts,
        obj_text_or_texts=obj_text_or_texts,
        subj_label=subj_label,
        obj_label=obj_label,
        subj_keywords=subj_keywords,
        obj_keywords=obj_keywords,
        label=label,
        min_hops=min_hops,
        max_hops=max_hops,
        k=k,
        k_rank=k_rank,
        similarity_threshold=similarity_threshold,
        fulltext_threshold=fulltext_threshold,
        ef_search=ef_search,
        conjunctive=conjunctive,
        bm25_b=bm25_b,
        output_format=output_format,
    )

RRF of vector + regex variable-length path search, AND semantics.

Each side is hybrid-searched (vec + regex) independently; the path's rrf_score is the sum of the two endpoint hybrid scores. Falls through to path_similarity_search when no patterns are supplied.

Parameters:

Name Type Description Default
subj_text_or_texts Union[str, List[str]]

Query text (or list) for the subject vector branch.

required
obj_text_or_texts Union[str, List[str]]

Query text (or list) for the object vector branch.

required
subj_pattern_or_patterns Optional[Union[str, List[str]]]

Regex pattern (or list) for the subject.

None
obj_pattern_or_patterns Optional[Union[str, List[str]]]

Regex pattern (or list) for the object.

None
subj_label str

Entity label of the subject endpoint.

required
obj_label str

Entity label of the object endpoint.

required
label Optional[str]

Optional rel-label constraint for every hop.

None
min_hops int

Minimum hop count, inclusive (default: 1).

1
max_hops int

Maximum hop count, inclusive (default: 3).

3
k int

Maximum number of results.

10
k_rank int

RRF smoothing constant.

60
similarity_threshold Optional[float]

Optional vector-distance threshold.

None
fields Optional[List[str]]

Forwarded to the regex branch.

None
case_sensitive bool

Forwarded to the regex branch.

True
output_format str

"json" (default) or "csv".

'json'
Source code in synalinks/src/knowledge_bases/knowledge_base.py
async def path_hybrid_regex_search(
    self,
    subj_text_or_texts: Union[str, List[str]],
    obj_text_or_texts: Union[str, List[str]],
    *,
    subj_pattern_or_patterns: Optional[Union[str, List[str]]] = None,
    obj_pattern_or_patterns: Optional[Union[str, List[str]]] = None,
    subj_label: str,
    obj_label: str,
    label: Optional[str] = None,
    min_hops: int = 1,
    max_hops: int = 3,
    k: int = 10,
    k_rank: int = 60,
    similarity_threshold: Optional[float] = None,
    fields: Optional[List[str]] = None,
    case_sensitive: bool = True,
    output_format: str = "json",
):
    """RRF of vector + regex variable-length path search, AND semantics.

    Each side is hybrid-searched (vec + regex) independently; the
    path's ``rrf_score`` is the sum of the two endpoint hybrid
    scores. Falls through to `path_similarity_search` when
    no patterns are supplied.

    Args:
        subj_text_or_texts: Query text (or list) for the subject vector branch.
        obj_text_or_texts: Query text (or list) for the object vector branch.
        subj_pattern_or_patterns: Regex pattern (or list) for the subject.
        obj_pattern_or_patterns: Regex pattern (or list) for the object.
        subj_label: Entity label of the subject endpoint.
        obj_label: Entity label of the object endpoint.
        label: Optional rel-label constraint for every hop.
        min_hops: Minimum hop count, inclusive (default: 1).
        max_hops: Maximum hop count, inclusive (default: 3).
        k: Maximum number of results.
        k_rank: RRF smoothing constant.
        similarity_threshold: Optional vector-distance threshold.
        fields: Forwarded to the regex branch.
        case_sensitive: Forwarded to the regex branch.
        output_format: ``"json"`` (default) or ``"csv"``.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.path_hybrid_regex_search(
        subj_text_or_texts=subj_text_or_texts,
        obj_text_or_texts=obj_text_or_texts,
        subj_pattern_or_patterns=subj_pattern_or_patterns,
        obj_pattern_or_patterns=obj_pattern_or_patterns,
        subj_label=subj_label,
        obj_label=obj_label,
        label=label,
        min_hops=min_hops,
        max_hops=max_hops,
        k=k,
        k_rank=k_rank,
        similarity_threshold=similarity_threshold,
        fields=fields,
        case_sensitive=case_sensitive,
        output_format=output_format,
    )

Regex variable-length path search, AND semantics.

Both endpoints must match their respective regex pattern. Regex is binary; ranking is by path length (shorter first).

Parameters:

Name Type Description Default
subj_pattern str

Regex pattern for the subject endpoint.

required
obj_pattern str

Regex pattern for the object endpoint.

required
subj_label str

Entity label of the subject endpoint.

required
obj_label str

Entity label of the object endpoint.

required
label Optional[str]

Optional rel-label constraint for every hop.

None
min_hops int

Minimum hop count, inclusive (default: 1).

1
max_hops int

Maximum hop count, inclusive (default: 3).

3
k int

Maximum number of results.

10
fields Optional[List[str]]

Optional whitelist of fields, applied to both endpoints.

None
case_sensitive bool

When False, matches case-insensitively.

True
output_format str

"json" (default) or "csv".

'json'
Source code in synalinks/src/knowledge_bases/knowledge_base.py
async def path_regex_search(
    self,
    subj_pattern: str,
    obj_pattern: str,
    *,
    subj_label: str,
    obj_label: str,
    label: Optional[str] = None,
    min_hops: int = 1,
    max_hops: int = 3,
    k: int = 10,
    fields: Optional[List[str]] = None,
    case_sensitive: bool = True,
    output_format: str = "json",
):
    """Regex variable-length path search, AND semantics.

    Both endpoints must match their respective regex pattern.
    Regex is binary; ranking is by path length (shorter first).

    Args:
        subj_pattern: Regex pattern for the subject endpoint.
        obj_pattern: Regex pattern for the object endpoint.
        subj_label: Entity label of the subject endpoint.
        obj_label: Entity label of the object endpoint.
        label: Optional rel-label constraint for every hop.
        min_hops: Minimum hop count, inclusive (default: 1).
        max_hops: Maximum hop count, inclusive (default: 3).
        k: Maximum number of results.
        fields: Optional whitelist of fields, applied to both endpoints.
        case_sensitive: When ``False``, matches case-insensitively.
        output_format: ``"json"`` (default) or ``"csv"``.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.path_regex_search(
        subj_pattern=subj_pattern,
        obj_pattern=obj_pattern,
        subj_label=subj_label,
        obj_label=obj_label,
        label=label,
        min_hops=min_hops,
        max_hops=max_hops,
        k=k,
        fields=fields,
        case_sensitive=case_sensitive,
        output_format=output_format,
    )

Variable-length path search where BOTH endpoints match.

Returns paths of min_hops..max_hops edges whose start node is vector-close to subj_text_or_texts AND whose end node is vector-close to obj_text_or_texts. label is an optional rel-label constraint applied to every hop; when omitted, any edge type is allowed.

Each row carries the full path: nodes (every node along the way, endpoints included), rels (every edge), and length (hop count), alongside the two endpoint distances and flattened endpoint PKs.

Parameters:

Name Type Description Default
subj_text_or_texts Union[str, List[str]]

Query text (or list) for the subject.

required
obj_text_or_texts Union[str, List[str]]

Query text (or list) for the object.

required
subj_label str

Entity label of the subject endpoint.

required
obj_label str

Entity label of the object endpoint.

required
label Optional[str]

Optional rel-label constraint for every hop.

None
min_hops int

Minimum hop count, inclusive (default: 1).

1
max_hops int

Maximum hop count, inclusive (default: 3).

3
k int

Maximum number of results.

10
subj_threshold Optional[float]

Optional subject-side distance threshold.

None
obj_threshold Optional[float]

Optional object-side distance threshold.

None
output_format str

"json" (default) or "csv".

'json'
Source code in synalinks/src/knowledge_bases/knowledge_base.py
async def path_similarity_search(
    self,
    subj_text_or_texts: Union[str, List[str]],
    obj_text_or_texts: Union[str, List[str]],
    *,
    subj_label: str,
    obj_label: str,
    label: Optional[str] = None,
    min_hops: int = 1,
    max_hops: int = 3,
    k: int = 10,
    subj_threshold: Optional[float] = None,
    obj_threshold: Optional[float] = None,
    ef_search: Optional[int] = None,
    output_format: str = "json",
):
    """Variable-length path search where BOTH endpoints match.

    Returns paths of ``min_hops..max_hops`` edges whose start
    node is vector-close to ``subj_text_or_texts`` AND whose
    end node is vector-close to ``obj_text_or_texts``. ``label``
    is an optional rel-label constraint applied to every hop;
    when omitted, any edge type is allowed.

    Each row carries the full path: ``nodes`` (every node along
    the way, endpoints included), ``rels`` (every edge), and
    ``length`` (hop count), alongside the two endpoint distances
    and flattened endpoint PKs.

    Args:
        subj_text_or_texts: Query text (or list) for the subject.
        obj_text_or_texts: Query text (or list) for the object.
        subj_label: Entity label of the subject endpoint.
        obj_label: Entity label of the object endpoint.
        label: Optional rel-label constraint for every hop.
        min_hops: Minimum hop count, inclusive (default: 1).
        max_hops: Maximum hop count, inclusive (default: 3).
        k: Maximum number of results.
        subj_threshold: Optional subject-side distance threshold.
        obj_threshold: Optional object-side distance threshold.
        output_format: ``"json"`` (default) or ``"csv"``.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.path_similarity_search(
        subj_text_or_texts,
        obj_text_or_texts,
        subj_label=subj_label,
        obj_label=obj_label,
        label=label,
        min_hops=min_hops,
        max_hops=max_hops,
        k=k,
        subj_threshold=subj_threshold,
        obj_threshold=obj_threshold,
        ef_search=ef_search,
        output_format=output_format,
    )

Find rows whose string fields match a regular expression.

DuckDB evaluates regexes with RE2, so patterns are linear-time and not vulnerable to catastrophic backtracking.

Parameters:

Name Type Description Default
pattern str

The regex pattern (RE2 syntax).

required
table_name str

Target table.

required
fields Optional[List[str]]

Field names to match against. Defaults to every string field on the schema. Names are snake_case- normalized to match stored column names.

None
case_sensitive bool

When False, match case-insensitively.

True
k int

Maximum number of results.

10
output_format str

"json" (list of dicts, default) / "csv" (text).

'json'
Source code in synalinks/src/knowledge_bases/knowledge_base.py
async def regex_search(
    self,
    pattern: str,
    *,
    table_name: str,
    fields: Optional[List[str]] = None,
    case_sensitive: bool = True,
    k: int = 10,
    output_format: str = "json",
):
    """Find rows whose string fields match a regular expression.

    DuckDB evaluates regexes with RE2, so patterns are linear-time
    and not vulnerable to catastrophic backtracking.

    Args:
        pattern: The regex pattern (RE2 syntax).
        table_name: Target table.
        fields: Field names to match against. Defaults to every
            string field on the schema. Names are snake_case-
            normalized to match stored column names.
        case_sensitive: When ``False``, match case-insensitively.
        k: Maximum number of results.
        output_format: ``"json"`` (list of dicts, default) / ``"csv"`` (text).
    """
    return await self.sql_adapter.regex_search(
        pattern,
        table_name=table_name,
        fields=fields,
        case_sensitive=case_sensitive,
        k=k,
        output_format=output_format,
    )

BM25 fulltext search over relations of a given label.

Per matched edge, the final score is the sum of the subject-side and object-side BM25 scores — either-endpoint union (edge surfaces if either endpoint matched).

Parameters:

Name Type Description Default
text_or_texts Union[str, List[str]]

Query text or list of query texts.

required
label str

The relation label to search within.

required
k int

Maximum number of results.

10
threshold Optional[float]

Optional minimum BM25 threshold applied per endpoint.

None
conjunctive bool

AND-mode query (every term must match).

False
bm25_b Optional[float]

Optional override for BM25's b parameter.

None
output_format str

"json" (default) or "csv".

'json'
Source code in synalinks/src/knowledge_bases/knowledge_base.py
async def relation_fulltext_search(
    self,
    text_or_texts: Union[str, List[str]],
    *,
    label: str,
    k: int = 10,
    threshold: Optional[float] = None,
    conjunctive: bool = False,
    bm25_b: Optional[float] = None,
    output_format: str = "json",
):
    """BM25 fulltext search over relations of a given label.

    Per matched edge, the final ``score`` is the sum of the
    subject-side and object-side BM25 scores — either-endpoint
    union (edge surfaces if either endpoint matched).

    Args:
        text_or_texts: Query text or list of query texts.
        label: The relation label to search within.
        k: Maximum number of results.
        threshold: Optional minimum BM25 threshold applied per endpoint.
        conjunctive: AND-mode query (every term must match).
        bm25_b: Optional override for BM25's ``b`` parameter.
        output_format: ``"json"`` (default) or ``"csv"``.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.relation_fulltext_search(
        text_or_texts,
        label=label,
        k=k,
        threshold=threshold,
        conjunctive=conjunctive,
        bm25_b=bm25_b,
        output_format=output_format,
    )

RRF of vector + BM25 fulltext over relations of a label.

Either-endpoint union: per matched edge, the final rrf_score is the sum of the subject-side and object-side hybrid scores — equivalent to a 4-source RRF. Falls back to fulltext-only when no embedding model is configured.

Parameters:

Name Type Description Default
text_or_texts Union[str, List[str]]

Query text or list of query texts.

required
label str

The relation label to search within.

required
k int

Maximum number of results.

10
k_rank int

RRF smoothing constant.

60
similarity_threshold Optional[float]

Optional vector-distance threshold.

None
fulltext_threshold Optional[float]

Optional BM25 score threshold.

None
ef_search Optional[int]

HNSW efs knob for the vector branch.

None
conjunctive bool

AND vs OR for the BM25 branch.

False
bm25_b Optional[float]

Optional override for BM25's b parameter.

None
output_format str

"json" (default) or "csv".

'json'
Source code in synalinks/src/knowledge_bases/knowledge_base.py
async def relation_hybrid_fts_search(
    self,
    text_or_texts: Union[str, List[str]],
    *,
    keywords: Optional[Union[str, List[str]]] = None,
    label: str,
    k: int = 10,
    k_rank: int = 60,
    similarity_threshold: Optional[float] = None,
    fulltext_threshold: Optional[float] = None,
    ef_search: Optional[int] = None,
    conjunctive: bool = False,
    bm25_b: Optional[float] = None,
    output_format: str = "json",
):
    """RRF of vector + BM25 fulltext over relations of a label.

    Either-endpoint union: per matched edge, the final
    ``rrf_score`` is the sum of the subject-side and
    object-side hybrid scores — equivalent to a 4-source RRF.
    Falls back to fulltext-only when no embedding model is
    configured.

    Args:
        text_or_texts: Query text or list of query texts.
        label: The relation label to search within.
        k: Maximum number of results.
        k_rank: RRF smoothing constant.
        similarity_threshold: Optional vector-distance threshold.
        fulltext_threshold: Optional BM25 score threshold.
        ef_search: HNSW ``efs`` knob for the vector branch.
        conjunctive: AND vs OR for the BM25 branch.
        bm25_b: Optional override for BM25's ``b`` parameter.
        output_format: ``"json"`` (default) or ``"csv"``.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.relation_hybrid_fts_search(
        text_or_texts=text_or_texts,
        label=label,
        keywords=keywords,
        k=k,
        k_rank=k_rank,
        similarity_threshold=similarity_threshold,
        fulltext_threshold=fulltext_threshold,
        ef_search=ef_search,
        conjunctive=conjunctive,
        bm25_b=bm25_b,
        output_format=output_format,
    )

RRF of vector similarity + regex match over relations.

Per matched edge, the final rrf_score is the sum of the subject's and the object's hybrid scores — same 4-source-RRF reduction as relation_hybrid_fts_search. Falls through to relation_similarity_search when no patterns are supplied.

Parameters:

Name Type Description Default
text_or_texts Union[str, List[str]]

Query text or list of query texts for the vector branch.

required
pattern_or_patterns Optional[Union[str, List[str]]]

Regex pattern (or list) for the regex branch.

None
label str

The relation label.

required
fields Optional[List[str]]

Forwarded to entity_regex_search.

None
case_sensitive bool

Forwarded to entity_regex_search.

True
k int

Maximum number of results.

10
k_rank int

RRF smoothing constant.

60
similarity_threshold Optional[float]

Optional vector-distance threshold.

None
output_format str

"json" (default) or "csv".

'json'
Source code in synalinks/src/knowledge_bases/knowledge_base.py
async def relation_hybrid_regex_search(
    self,
    text_or_texts: Union[str, List[str]],
    *,
    pattern_or_patterns: Optional[Union[str, List[str]]] = None,
    label: str,
    fields: Optional[List[str]] = None,
    case_sensitive: bool = True,
    k: int = 10,
    k_rank: int = 60,
    similarity_threshold: Optional[float] = None,
    output_format: str = "json",
):
    """RRF of vector similarity + regex match over relations.

    Per matched edge, the final ``rrf_score`` is the sum of the
    subject's and the object's hybrid scores — same 4-source-RRF
    reduction as `relation_hybrid_fts_search`. Falls through
    to `relation_similarity_search` when no patterns are
    supplied.

    Args:
        text_or_texts: Query text or list of query texts for the vector branch.
        pattern_or_patterns: Regex pattern (or list) for the regex branch.
        label: The relation label.
        fields: Forwarded to `entity_regex_search`.
        case_sensitive: Forwarded to `entity_regex_search`.
        k: Maximum number of results.
        k_rank: RRF smoothing constant.
        similarity_threshold: Optional vector-distance threshold.
        output_format: ``"json"`` (default) or ``"csv"``.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.relation_hybrid_regex_search(
        text_or_texts=text_or_texts,
        pattern_or_patterns=pattern_or_patterns,
        label=label,
        fields=fields,
        case_sensitive=case_sensitive,
        k=k,
        k_rank=k_rank,
        similarity_threshold=similarity_threshold,
        output_format=output_format,
    )

Regex search over relations of a given label.

Composed via entity_regex_search on each endpoint. Regex hits are binary; the row's score is 2.0 when both endpoints matched and 1.0 when only one did, with matched_on indicating the side(s).

Parameters:

Name Type Description Default
pattern str

The regex pattern.

required
label str

The relation label to search within.

required
fields Optional[List[str]]

Optional whitelist of fields, applied to both endpoints.

None
case_sensitive bool

When False, matches case-insensitively.

True
k int

Maximum number of rows.

10
output_format str

"json" (default) or "csv".

'json'
Source code in synalinks/src/knowledge_bases/knowledge_base.py
async def relation_regex_search(
    self,
    pattern: str,
    *,
    label: str,
    fields: Optional[List[str]] = None,
    case_sensitive: bool = True,
    k: int = 10,
    output_format: str = "json",
):
    """Regex search over relations of a given label.

    Composed via `entity_regex_search` on each endpoint.
    Regex hits are binary; the row's ``score`` is 2.0 when both
    endpoints matched and 1.0 when only one did, with
    ``matched_on`` indicating the side(s).

    Args:
        pattern: The regex pattern.
        label: The relation label to search within.
        fields: Optional whitelist of fields, applied to both endpoints.
        case_sensitive: When ``False``, matches case-insensitively.
        k: Maximum number of rows.
        output_format: ``"json"`` (default) or ``"csv"``.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.relation_regex_search(
        pattern,
        label=label,
        fields=fields,
        case_sensitive=case_sensitive,
        k=k,
        output_format=output_format,
    )

Vector similarity search over relations of a given label.

The query text matches against BOTH endpoints (subject and object); the adapter returns one row per matched edge with its best (lowest) distance and a matched_on tag ("subj", "obj", or "both").

Parameters:

Name Type Description Default
text_or_texts Union[str, List[str]]

Query text or list of query texts.

required
label str

The relation label to search within.

required
k int

Maximum number of results.

10
threshold Optional[float]

Optional vector-distance threshold per endpoint.

None
ef_search Optional[int]

HNSW efs knob applied to both endpoint vector searches.

None
output_format str

"json" (default) or "csv".

'json'
Source code in synalinks/src/knowledge_bases/knowledge_base.py
async def relation_similarity_search(
    self,
    text_or_texts: Union[str, List[str]],
    *,
    label: str,
    k: int = 10,
    threshold: Optional[float] = None,
    ef_search: Optional[int] = None,
    output_format: str = "json",
):
    """Vector similarity search over relations of a given label.

    The query text matches against BOTH endpoints (subject and
    object); the adapter returns one row per matched edge with
    its best (lowest) distance and a ``matched_on`` tag
    (``"subj"``, ``"obj"``, or ``"both"``).

    Args:
        text_or_texts: Query text or list of query texts.
        label: The relation label to search within.
        k: Maximum number of results.
        threshold: Optional vector-distance threshold per endpoint.
        ef_search: HNSW ``efs`` knob applied to both endpoint
            vector searches.
        output_format: ``"json"`` (default) or ``"csv"``.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.relation_similarity_search(
        text_or_texts,
        label=label,
        k=k,
        threshold=threshold,
        ef_search=ef_search,
        output_format=output_format,
    )

rename(source, *, table_name=None, table_description=None) async

Rename a table and/or update its description.

Pass at least one of table_name / table_description. When table_name is given the underlying table is renamed via ALTER TABLE …, the FTS / vector indexes are rebuilt under the new name, and the adapter's known-models list is updated so subsequent default-table searches find the table under its new identity.

Parameters:

Name Type Description Default
source Any

SymbolicDataModel or table-name string for the table to rename. The string form is itself PascalCase-normalized, so callers can pass the same input they used in from_csv (e.g. "my-docs").

required
table_name Optional[str]

New table name. Always normalized to PascalCase.

None
table_description Optional[str]

Optional natural-language description attached to the resulting schema.

None

Returns:

Type Description
Any

A fresh SymbolicDataModel for the (possibly

Any

renamed) table.

Source code in synalinks/src/knowledge_bases/knowledge_base.py
async def rename(
    self,
    source: Any,
    *,
    table_name: Optional[str] = None,
    table_description: Optional[str] = None,
) -> Any:
    """Rename a table and/or update its description.

    Pass at least one of ``table_name`` / ``table_description``.
    When ``table_name`` is given the underlying table is
    renamed via ``ALTER TABLE …``, the FTS / vector indexes are
    rebuilt under the new name, and the adapter's known-models
    list is updated so subsequent default-table searches find
    the table under its new identity.

    Args:
        source: ``SymbolicDataModel`` or table-name string for
            the table to rename. The string form is itself
            PascalCase-normalized, so callers can pass the
            same input they used in `from_csv` (e.g.
            ``"my-docs"``).
        table_name: New table name. Always normalized to
            PascalCase.
        table_description: Optional natural-language description
            attached to the resulting schema.

    Returns:
        A fresh `SymbolicDataModel` for the (possibly
        renamed) table.
    """
    return await self.sql_adapter.rename(
        source,
        table_name=table_name,
        table_description=table_description,
    )

Vector similarity search against a single table.

Parameters:

Name Type Description Default
text_or_texts Union[str, List[str]]

Query text or list of query texts.

required
table_name str

Target table (single-table search).

required
k int

Maximum number of results to return.

10
threshold Optional[float]

Optional maximum vector-distance threshold.

None
ef_search Optional[int]

HNSW search-time candidate-list depth. None keeps the index-time value (or the engine default). Higher = better recall, slower query.

None
output_format str

"json" (default, list of dicts — JSON-shaped Python data) or "csv" (CSV string, useful for handing results to an LM since CSV is ~30-50% fewer tokens than equivalent JSON).

'json'
Source code in synalinks/src/knowledge_bases/knowledge_base.py
async def similarity_search(
    self,
    text_or_texts: Union[str, List[str]],
    *,
    table_name: str,
    k: int = 10,
    threshold: Optional[float] = None,
    ef_search: Optional[int] = None,
    output_format: str = "json",
):
    """Vector similarity search against a single table.

    Args:
        text_or_texts: Query text or list of query texts.
        table_name: Target table (single-table search).
        k: Maximum number of results to return.
        threshold: Optional maximum vector-distance threshold.
        ef_search: HNSW search-time candidate-list depth.
            ``None`` keeps the index-time value (or the engine
            default). Higher = better recall, slower query.
        output_format: ``"json"`` (default, list of dicts —
            JSON-shaped Python data) or ``"csv"`` (CSV string,
            useful for handing results to an LM since CSV is
            ~30-50% fewer tokens than equivalent JSON).
    """
    return await self.sql_adapter.similarity_search(
        text_or_texts,
        table_name=table_name,
        k=k,
        threshold=threshold,
        ef_search=ef_search,
        output_format=output_format,
    )

sql(sql, *, params=None, output_format='json', **kwargs) async

Execute a raw SQL query against the knowledge base.

Counterpart of cypher — the method is named after the query language so a dual-adapter KnowledgeBase has a clear per-language entry point.

Parameters:

Name Type Description Default
sql str

The SQL string to execute.

required
params dict

Optional list of parameters for parameterized queries.

None
output_format str

"json" (default, list of dicts — JSON-shaped Python data) or "csv" (CSV string, useful when handing the result to an LM).

'json'
**kwargs Any

Additional options. The most important one is read_only=True/False. When True (the DuckDB adapter's default) two layers of defence apply:

  1. The SQL is parsed with the engine's own parser and any non-SELECT statement is rejected. This catches multi-statement injection (e.g. SELECT 1; DROP TABLE x), COPY ... TO 'file' exfiltration, ATTACH, EXPORT, and other side-effecting statements. This is the only layer that blocks writes — the adapter's underlying connection is read-write (one connection per adapter, reused across operations), so the parser check is what keeps untrusted SQL read-only.
  2. enable_external_access is disabled on that connection at construction time, so SELECT table functions that touch the host filesystem or network — read_csv, read_parquet, read_json, read_blob, read_text, glob and the httpfs/S3 variants — return a permission error instead of leaking files. Without this layer, SELECT * FROM read_csv('/etc/passwd', ...) would pass defence (1) because it is a syntactically valid SELECT.

Pass read_only=False only from trusted call sites that genuinely need to mutate state. Those paths still run on the same sandboxed connection (no external I/O), but they bypass the parser check, so any SQL is accepted — keep them out of the LM-tool-call surface.

{}

Returns:

Type Description
Union[List[Dict[str, Any]], str]

A list of dicts when output_format="json", or a CSV string when output_format="csv".

Source code in synalinks/src/knowledge_bases/knowledge_base.py
async def sql(
    self,
    sql: str,
    *,
    params: Optional[Dict[str, Any]] = None,
    output_format: str = "json",
    **kwargs,
) -> Union[List[Dict[str, Any]], str]:
    """Execute a raw SQL query against the knowledge base.

    Counterpart of `cypher` — the method is named after the
    query language so a dual-adapter KnowledgeBase has a clear
    per-language entry point.

    Args:
        sql (str): The SQL string to execute.
        params (dict): Optional list of parameters for parameterized queries.
        output_format: ``"json"`` (default, list of dicts —
            JSON-shaped Python data) or ``"csv"`` (CSV string,
            useful when handing the result to an LM).
        **kwargs (Any): Additional options. The most important one is
            ``read_only=True/False``. When ``True`` (the DuckDB adapter's
            default) two layers of defence apply:

            1. The SQL is parsed with the engine's own parser and any
               non-``SELECT`` statement is rejected. This catches
               multi-statement injection (e.g. ``SELECT 1; DROP TABLE x``),
               ``COPY ... TO 'file'`` exfiltration, ``ATTACH``, ``EXPORT``,
               and other side-effecting statements. This is the only
               layer that blocks writes — the adapter's underlying
               connection is read-write (one connection per adapter,
               reused across operations), so the parser check is what
               keeps untrusted SQL read-only.
            2. ``enable_external_access`` is disabled on that connection
               at construction time, so ``SELECT`` table functions that
               touch the host filesystem or network — ``read_csv``,
               ``read_parquet``, ``read_json``, ``read_blob``,
               ``read_text``, ``glob`` and the httpfs/S3 variants —
               return a permission error instead of leaking files.
               Without this layer,
               ``SELECT * FROM read_csv('/etc/passwd', ...)`` would pass
               defence (1) because it is a syntactically valid ``SELECT``.

            Pass ``read_only=False`` only from trusted call sites that
            genuinely need to mutate state. Those paths still run on
            the same sandboxed connection (no external I/O), but they
            bypass the parser check, so any SQL is accepted — keep them
            out of the LM-tool-call surface.

    Returns:
        (Union[List[Dict[str, Any]], str]): A list of dicts when
            ``output_format="json"``, or a CSV string when
            ``output_format="csv"``.
    """
    return await self.sql_adapter.sql(
        sql, params=params, output_format=output_format, **kwargs
    )

update(data_model_or_data_models, *, verbose='auto') async

Insert or update records in the knowledge base.

Parameters:

Name Type Description Default
data_model_or_data_models JsonDataModel | List[JsonDataModel] | Dataset

A single JsonDataModel, a list of JsonDataModel / DataModel instances, or a synalinks Dataset. The Dataset form streams the source batch-by-batch (one adapter.update call per yielded batch) so memory stays bounded for large CSV / Parquet / HuggingFace sources. The dataset must be inputs-only — no output_template — because the knowledge base stores records, not (input, target) pairs; pass a labeled dataset and you'll get a ValueError.

Upserts key off the first declared field of the model — see the "Primary Key Convention" section on the class docstring for how that's resolved (and why no UUID is injected).

required
verbose int | str

"auto", 0, 1, or 2. Verbosity for the Dataset path; matches the trainer's fit() semantics. "auto" (default) resolves to 1 when a Dataset is passed (a per-batch progress bar — same widget fit() uses, with ETA when len(dataset) is known) and is a no-op for the scalar / list forms, which finish in a single adapter call.

'auto'

Returns:

Type Description
Union[Any, List[Any]]

The primary key value(s) of the inserted/updated records.

Union[Any, List[Any]]

Scalar in / scalar out; list in / list out; Dataset in /

Union[Any, List[Any]]

flat list of every batch's ids concatenated.

Source code in synalinks/src/knowledge_bases/knowledge_base.py
async def update(
    self,
    data_model_or_data_models: Union[Any, List[Any], Dataset],
    *,
    verbose="auto",
) -> Union[Any, List[Any]]:
    """Insert or update records in the knowledge base.

    Args:
        data_model_or_data_models (JsonDataModel | List[JsonDataModel] | Dataset):
            A single ``JsonDataModel``, a list of ``JsonDataModel`` /
            ``DataModel`` instances, or a synalinks ``Dataset``.
            The ``Dataset`` form streams the source batch-by-batch
            (one ``adapter.update`` call per yielded batch) so memory
            stays bounded for large CSV / Parquet / HuggingFace
            sources. The dataset must be inputs-only — no
            ``output_template`` — because the knowledge base stores
            records, not ``(input, target)`` pairs; pass a
            labeled dataset and you'll get a ``ValueError``.

            Upserts key off the first declared field of the model —
            see the "Primary Key Convention" section on the class
            docstring for how that's resolved (and why no UUID is
            injected).
        verbose (int | str): ``"auto"``, ``0``, ``1``, or ``2``.
            Verbosity for the ``Dataset`` path; matches the
            trainer's ``fit()`` semantics. ``"auto"`` (default)
            resolves to ``1`` when a ``Dataset`` is passed (a
            per-batch progress bar — same widget ``fit()`` uses,
            with ETA when ``len(dataset)`` is known) and is a
            no-op for the scalar / list forms, which finish in a
            single adapter call.

    Returns:
        The primary key value(s) of the inserted/updated records.
        Scalar in / scalar out; list in / list out; ``Dataset`` in /
        flat list of every batch's ids concatenated.
    """
    if isinstance(data_model_or_data_models, Dataset):
        return await self._update_from_dataset(
            data_model_or_data_models, verbose=verbose
        )
    return await self.sql_adapter.update(data_model_or_data_models)

update_entities(entity_or_entities) async

Insert or update one or more entities (nodes) in the graph.

Graph-side counterpart of the SQL update. The name mirrors the Entities data model; pass either a single Entity or a list — the return shape matches the input.

Parameters:

Name Type Description Default
entity_or_entities Union[Any, List[Any]]

An Entity instance, or a list of them (or anything satisfying is_entity).

required

Returns:

Type Description
Union[Any, List[Any]]

The node id(s) assigned by the backend. Scalar in / scalar

Union[Any, List[Any]]

out; list in / list out.

Source code in synalinks/src/knowledge_bases/knowledge_base.py
async def update_entities(
    self,
    entity_or_entities: Union[Any, List[Any]],
) -> Union[Any, List[Any]]:
    """Insert or update one or more entities (nodes) in the graph.

    Graph-side counterpart of the SQL `update`. The name
    mirrors the `Entities` data model; pass either a single
    ``Entity`` or a list — the return shape matches the input.

    Args:
        entity_or_entities: An ``Entity`` instance, or a list of
            them (or anything satisfying ``is_entity``).

    Returns:
        The node id(s) assigned by the backend. Scalar in / scalar
        out; list in / list out.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.update_entities(entity_or_entities)

update_knowledge_graph(knowledge_graph) async

Bulk-insert a full knowledge graph (entities + relations).

Equivalent to calling update_entities then update_relations, but concrete adapters may optimize the combined path.

Parameters:

Name Type Description Default
knowledge_graph Any

A KnowledgeGraph instance.

required

Returns:

Type Description
Any

A dict with ``{"entities": [...ids...], "relations":

Any

[...ids...]}``.

Source code in synalinks/src/knowledge_bases/knowledge_base.py
async def update_knowledge_graph(self, knowledge_graph: Any) -> Any:
    """Bulk-insert a full knowledge graph (entities + relations).

    Equivalent to calling `update_entities` then
    `update_relations`, but concrete adapters may optimize
    the combined path.

    Args:
        knowledge_graph: A ``KnowledgeGraph`` instance.

    Returns:
        A dict with ``{"entities": [...ids...], "relations":
        [...ids...]}``.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.update_knowledge_graph(knowledge_graph)

update_relations(relation_or_relations) async

Insert or update one or more relations (edges) in the graph.

Mirrors the Relations data model. Each relation's subj and obj are upserted as needed so every edge has both endpoints.

Parameters:

Name Type Description Default
relation_or_relations Union[Any, List[Any]]

A Relation instance, or a list of them (or anything satisfying is_relation).

required

Returns:

Type Description
Union[Any, List[Any]]

The edge id(s) assigned by the backend. Scalar in / scalar

Union[Any, List[Any]]

out; list in / list out.

Source code in synalinks/src/knowledge_bases/knowledge_base.py
async def update_relations(
    self,
    relation_or_relations: Union[Any, List[Any]],
) -> Union[Any, List[Any]]:
    """Insert or update one or more relations (edges) in the graph.

    Mirrors the `Relations` data model. Each relation's
    ``subj`` and ``obj`` are upserted as needed so every edge has
    both endpoints.

    Args:
        relation_or_relations: A ``Relation`` instance, or a list
            of them (or anything satisfying ``is_relation``).

    Returns:
        The edge id(s) assigned by the backend. Scalar in / scalar
        out; list in / list out.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.update_relations(relation_or_relations)